Class ArabicStemmer

java.lang.Object
org.apache.lucene.analysis.ar.ArabicStemmer

public class ArabicStemmer extends Object
Stemmer for Arabic.

Stemming is done in-place for efficiency, operating on a termbuffer.

Stemming is defined as:

  • Removal of attached definite article, conjunction, and prepositions.
  • Stemming of common suffixes.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final char
     
    static final char
     
    static final char
     
    static final char
     
    static final char
     
    static final char
     
    static final char
     
    static final char[][]
     
    static final char[][]
     
    static final char
     
    static final char
     
    static final char
     
    static final char
     
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    stem(char[] s, int len)
    Stem an input buffer of Arabic text.
    int
    stemPrefix(char[] s, int len)
    Stem a prefix off an Arabic word.
    int
    stemSuffix(char[] s, int len)
    Stem suffix(es) off an Arabic word.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • ArabicStemmer

      public ArabicStemmer()
  • Method Details

    • stem

      public int stem(char[] s, int len)
      Stem an input buffer of Arabic text.
      Parameters:
      s - input buffer
      len - length of input buffer
      Returns:
      length of input buffer after normalization
    • stemPrefix

      public int stemPrefix(char[] s, int len)
      Stem a prefix off an Arabic word.
      Parameters:
      s - input buffer
      len - length of input buffer
      Returns:
      new length of input buffer after stemming.
    • stemSuffix

      public int stemSuffix(char[] s, int len)
      Stem suffix(es) off an Arabic word.
      Parameters:
      s - input buffer
      len - length of input buffer
      Returns:
      new length of input buffer after stemming