Package org.apache.lucene.analysis.ar
Class ArabicStemmer
java.lang.Object
org.apache.lucene.analysis.ar.ArabicStemmer
Stemmer for Arabic.
Stemming is done in-place for efficiency, operating on a termbuffer.
Stemming is defined as:
- Removal of attached definite article, conjunction, and prepositions.
- Stemming of common suffixes.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char[][]
static final char[][]
static final char
static final char
static final char
static final char
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint
stem
(char[] s, int len) Stem an input buffer of Arabic text.int
stemPrefix
(char[] s, int len) Stem a prefix off an Arabic word.int
stemSuffix
(char[] s, int len) Stem suffix(es) off an Arabic word.
-
Field Details
-
ALEF
public static final char ALEF- See Also:
-
BEH
public static final char BEH- See Also:
-
TEH_MARBUTA
public static final char TEH_MARBUTA- See Also:
-
TEH
public static final char TEH- See Also:
-
FEH
public static final char FEH- See Also:
-
KAF
public static final char KAF- See Also:
-
LAM
public static final char LAM- See Also:
-
NOON
public static final char NOON- See Also:
-
HEH
public static final char HEH- See Also:
-
WAW
public static final char WAW- See Also:
-
YEH
public static final char YEH- See Also:
-
prefixes
public static final char[][] prefixes -
suffixes
public static final char[][] suffixes
-
-
Constructor Details
-
ArabicStemmer
public ArabicStemmer()
-
-
Method Details
-
stem
public int stem(char[] s, int len) Stem an input buffer of Arabic text.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-
stemPrefix
public int stemPrefix(char[] s, int len) Stem a prefix off an Arabic word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming.
-
stemSuffix
public int stemSuffix(char[] s, int len) Stem suffix(es) off an Arabic word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming
-