Class ArabicStemmer


  • public class ArabicStemmer
    extends Object
    Stemmer for Arabic.

    Stemming is done in-place for efficiency, operating on a termbuffer.

    Stemming is defined as:

    • Removal of attached definite article, conjunction, and prepositions.
    • Stemming of common suffixes.
    • Constructor Detail

      • ArabicStemmer

        public ArabicStemmer()
    • Method Detail

      • stem

        public int stem​(char[] s,
                        int len)
        Stem an input buffer of Arabic text.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        length of input buffer after normalization
      • stemPrefix

        public int stemPrefix​(char[] s,
                              int len)
        Stem a prefix off an Arabic word.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        new length of input buffer after stemming.
      • stemSuffix

        public int stemSuffix​(char[] s,
                              int len)
        Stem suffix(es) off an Arabic word.
        Parameters:
        s - input buffer
        len - length of input buffer
        Returns:
        new length of input buffer after stemming