public final class LowerCaseTokenizer extends LetterTokenizer
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
AttributeSource.State
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
LowerCaseTokenizer()
Construct a new LowerCaseTokenizer.
|
LowerCaseTokenizer(AttributeFactory factory)
Construct a new LowerCaseTokenizer using a given
AttributeFactory . |
Modifier and Type | Method and Description |
---|---|
protected int |
normalize(int c)
Converts char to lower case
Character.toLowerCase(int) . |
isTokenChar
end, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, incrementToken, reset
close, correctOffset, setReader
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public LowerCaseTokenizer()
public LowerCaseTokenizer(AttributeFactory factory)
AttributeFactory
.factory
- the attribute factory to use for this Tokenizer
protected int normalize(int c)
Character.toLowerCase(int)
.normalize
in class CharTokenizer
Copyright © 2000-2017 Apache Software Foundation. All Rights Reserved.