org.apache.lucene.analysis.ar
Class ArabicLetterTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.CharTokenizer
org.apache.lucene.analysis.core.LetterTokenizer
org.apache.lucene.analysis.ar.ArabicLetterTokenizer
- All Implemented Interfaces:
- Closeable
Deprecated. (3.1) Use StandardTokenizer
instead.
@Deprecated
public class ArabicLetterTokenizer
- extends LetterTokenizer
Tokenizer that breaks text into runs of letters and diacritics.
The problem with the standard Letter tokenizer is that it fails on diacritics.
Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required Version
compatibility when creating
ArabicLetterTokenizer
:
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Method Summary |
protected boolean |
isTokenChar(int c)
Deprecated. Allows for Letter category or NonspacingMark category |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState |
ArabicLetterTokenizer
public ArabicLetterTokenizer(Version matchVersion,
Reader in)
- Deprecated.
- Construct a new ArabicLetterTokenizer.
- Parameters:
matchVersion
- Lucene version
to match See abovein
- the input to split up into tokens
ArabicLetterTokenizer
public ArabicLetterTokenizer(Version matchVersion,
AttributeSource source,
Reader in)
- Deprecated.
- Construct a new ArabicLetterTokenizer using a given
AttributeSource
.
- Parameters:
matchVersion
- Lucene version to match See abovesource
- the attribute source to use for this Tokenizerin
- the input to split up into tokens
ArabicLetterTokenizer
public ArabicLetterTokenizer(Version matchVersion,
AttributeSource.AttributeFactory factory,
Reader in)
- Deprecated.
- Construct a new ArabicLetterTokenizer using a given
AttributeSource.AttributeFactory
. * @param
matchVersion Lucene version to match See
above
- Parameters:
factory
- the attribute factory to use for this Tokenizerin
- the input to split up into tokens
isTokenChar
protected boolean isTokenChar(int c)
- Deprecated.
- Allows for Letter category or NonspacingMark category
- Overrides:
isTokenChar
in class LetterTokenizer
- See Also:
LetterTokenizer.isTokenChar(int)
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.