Class HindiNormalizer

  • public class HindiNormalizer
    extends Object
    Normalizer for Hindi.

    Normalizes text to remove some differences in spelling variations.

    Implements the Hindi-language specific algorithm specified in: Word normalization in Indian languages Prasad Pingali and Vasudeva Varma.

    with the following additions from Hindi CLIR in Thirty Days Leah S. Larkey, Margaret E. Connell, and Nasreen AbdulJaleel.

    • Internal Zero-width joiner and Zero-width non-joiners are removed
    • In addition to chandrabindu, NA+halant is normalized to anusvara
    • Constructor Detail

      • HindiNormalizer

        public HindiNormalizer()
    • Method Detail

      • normalize

        public int normalize​(char[] s,
                             int len)
        Normalize an input buffer of Hindi text
        s - input buffer
        len - length of input buffer
        length of input buffer after normalization