public class HindiNormalizer extends Object
Normalizes text to remove some differences in spelling variations.
Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma. http://web2py.iiit.ac.in/publications/default/download/inproceedings.pdf.3fe5b38c-02ee-41ce-9a8f-3e745670be32.pdf
with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel. http://maroo.cs.umass.edu/pub/web/getpdf.php?id=454:
Constructor and Description |
---|
HindiNormalizer() |
Modifier and Type | Method and Description |
---|---|
int |
normalize(char[] s,
int len)
Normalize an input buffer of Hindi text
|
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.