org.apache.lucene.analysis.hi.HindiNormalizer

public class HindiNormalizer extends Object

Normalizer for Hindi.

Normalizes text to remove some differences in spelling variations.

Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma. http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf

with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel. http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:

Internal Zero-width joiner and Zero-width non-joiners are removed
In addition to chandrabindu, NA+halant is normalized to anusvara

Constructor Summary

Constructors

Constructor

Description

HindiNormalizer()
Method Summary

Modifier and Type

Method

Description

int

normalize(char[] s, int len)

Normalize an input buffer of Hindi text

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- HindiNormalizer
  
  public HindiNormalizer()
Method Details
- normalize
  
  public int normalize(char[] s, int len)
  
  Normalize an input buffer of Hindi text
  
  Parameters:
  
  s - input buffer
  
  len - length of input buffer
  
  Returns:
  
  length of input buffer after normalization

Class HindiNormalizer

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

HindiNormalizer

Method Details

normalize