Package org.apache.lucene.analysis.ar
Class ArabicNormalizer
java.lang.Object
org.apache.lucene.analysis.ar.ArabicNormalizer
Normalizer for Arabic.
Normalization is done in-place for efficiency, operating on a termbuffer.
Normalization is defined as:
- Normalization of hamza with alef seat to a bare alef.
- Normalization of teh marbuta to heh
- Normalization of dotless yeh (alef maksura) to yeh.
- Removal of Arabic diacritics (the harakat)
- Removal of tatweel (stretching character).
-
Field Summary
Modifier and TypeFieldDescriptionstatic final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
static final char
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionint
normalize
(char[] s, int len) Normalize an input buffer of Arabic text
-
Field Details
-
ALEF
public static final char ALEF- See Also:
-
ALEF_MADDA
public static final char ALEF_MADDA- See Also:
-
ALEF_HAMZA_ABOVE
public static final char ALEF_HAMZA_ABOVE- See Also:
-
ALEF_HAMZA_BELOW
public static final char ALEF_HAMZA_BELOW- See Also:
-
YEH
public static final char YEH- See Also:
-
DOTLESS_YEH
public static final char DOTLESS_YEH- See Also:
-
TEH_MARBUTA
public static final char TEH_MARBUTA- See Also:
-
HEH
public static final char HEH- See Also:
-
TATWEEL
public static final char TATWEEL- See Also:
-
FATHATAN
public static final char FATHATAN- See Also:
-
DAMMATAN
public static final char DAMMATAN- See Also:
-
KASRATAN
public static final char KASRATAN- See Also:
-
FATHA
public static final char FATHA- See Also:
-
DAMMA
public static final char DAMMA- See Also:
-
KASRA
public static final char KASRA- See Also:
-
SHADDA
public static final char SHADDA- See Also:
-
SUKUN
public static final char SUKUN- See Also:
-
-
Constructor Details
-
ArabicNormalizer
public ArabicNormalizer()
-
-
Method Details
-
normalize
public int normalize(char[] s, int len) Normalize an input buffer of Arabic text- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-