StandardTokenizer
instead.@Deprecated public class ArabicLetterTokenizer extends LetterTokenizer
The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.
You must specify the required Version
compatibility when creating
ArabicLetterTokenizer
:
CharTokenizer
uses an int based API to normalize and
detect token characters. See isTokenChar(int)
and
CharTokenizer.normalize(int)
for details.AttributeSource.AttributeFactory, AttributeSource.State
Constructor and Description |
---|
ArabicLetterTokenizer(AttributeSource.AttributeFactory factory,
Reader in)
Deprecated.
use
ArabicLetterTokenizer(Version, AttributeSource.AttributeFactory, Reader)
instead. This will be removed in Lucene 4.0. |
ArabicLetterTokenizer(AttributeSource source,
Reader in)
Deprecated.
use
ArabicLetterTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0. |
ArabicLetterTokenizer(Reader in)
Deprecated.
use
ArabicLetterTokenizer(Version, Reader) instead. This will
be removed in Lucene 4.0. |
ArabicLetterTokenizer(Version matchVersion,
AttributeSource.AttributeFactory factory,
Reader in)
Deprecated.
Construct a new ArabicLetterTokenizer using a given
AttributeSource.AttributeFactory . * @param
matchVersion Lucene version to match See
above |
ArabicLetterTokenizer(Version matchVersion,
AttributeSource source,
Reader in)
Deprecated.
Construct a new ArabicLetterTokenizer using a given
AttributeSource . |
ArabicLetterTokenizer(Version matchVersion,
Reader in)
Deprecated.
Construct a new ArabicLetterTokenizer.
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
isTokenChar(int c)
Deprecated.
Allows for Letter category or NonspacingMark category
|
end, incrementToken, isTokenChar, normalize, normalize, reset
close, correctOffset
reset
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public ArabicLetterTokenizer(Version matchVersion, Reader in)
matchVersion
- Lucene version
to match See abovein
- the input to split up into tokenspublic ArabicLetterTokenizer(Version matchVersion, AttributeSource source, Reader in)
AttributeSource
.matchVersion
- Lucene version to match See abovesource
- the attribute source to use for this Tokenizerin
- the input to split up into tokenspublic ArabicLetterTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)
AttributeSource.AttributeFactory
. * @param
matchVersion Lucene version to match See
abovefactory
- the attribute factory to use for this Tokenizerin
- the input to split up into tokens@Deprecated public ArabicLetterTokenizer(Reader in)
ArabicLetterTokenizer(Version, Reader)
instead. This will
be removed in Lucene 4.0.@Deprecated public ArabicLetterTokenizer(AttributeSource source, Reader in)
ArabicLetterTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0.AttributeSource
.@Deprecated public ArabicLetterTokenizer(AttributeSource.AttributeFactory factory, Reader in)
ArabicLetterTokenizer(Version, AttributeSource.AttributeFactory, Reader)
instead. This will be removed in Lucene 4.0.AttributeSource.AttributeFactory
.protected boolean isTokenChar(int c)
isTokenChar
in class LetterTokenizer
LetterTokenizer.isTokenChar(int)