public final class LowerCaseTokenizer extends LetterTokenizer
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
You must specify the required Version
compatibility when creating
LowerCaseTokenizer
:
CharTokenizer
uses an int based API to normalize and
detect token characters. See CharTokenizer.isTokenChar(int)
and
CharTokenizer.normalize(int)
for details.AttributeSource.AttributeFactory, AttributeSource.State
Constructor and Description |
---|
LowerCaseTokenizer(AttributeSource.AttributeFactory factory,
Reader in)
Deprecated.
use
LowerCaseTokenizer(Version, AttributeSource.AttributeFactory, Reader)
instead. This will be removed in Lucene 4.0. |
LowerCaseTokenizer(AttributeSource source,
Reader in)
Deprecated.
use
LowerCaseTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0. |
LowerCaseTokenizer(Reader in)
Deprecated.
use
LowerCaseTokenizer(Version, Reader) instead. This will be
removed in Lucene 4.0. |
LowerCaseTokenizer(Version matchVersion,
AttributeSource.AttributeFactory factory,
Reader in)
Construct a new LowerCaseTokenizer using a given
AttributeSource.AttributeFactory . |
LowerCaseTokenizer(Version matchVersion,
AttributeSource source,
Reader in)
Construct a new LowerCaseTokenizer using a given
AttributeSource . |
LowerCaseTokenizer(Version matchVersion,
Reader in)
Construct a new LowerCaseTokenizer.
|
Modifier and Type | Method and Description |
---|---|
protected int |
normalize(int c)
Converts char to lower case
Character.toLowerCase(int) . |
isTokenChar
end, incrementToken, isTokenChar, normalize, reset
close, correctOffset
reset
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public LowerCaseTokenizer(Version matchVersion, Reader in)
matchVersion
- Lucene version to match See abovein
- the input to split up into tokenspublic LowerCaseTokenizer(Version matchVersion, AttributeSource source, Reader in)
AttributeSource
.public LowerCaseTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)
AttributeSource.AttributeFactory
.@Deprecated public LowerCaseTokenizer(Reader in)
LowerCaseTokenizer(Version, Reader)
instead. This will be
removed in Lucene 4.0.@Deprecated public LowerCaseTokenizer(AttributeSource source, Reader in)
LowerCaseTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0.AttributeSource
.@Deprecated public LowerCaseTokenizer(AttributeSource.AttributeFactory factory, Reader in)
LowerCaseTokenizer(Version, AttributeSource.AttributeFactory, Reader)
instead. This will be removed in Lucene 4.0.AttributeSource.AttributeFactory
.protected int normalize(int c)
Character.toLowerCase(int)
.normalize
in class CharTokenizer