public class LetterTokenizer extends CharTokenizer
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
AttributeSource.State
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
LetterTokenizer()
Construct a new LetterTokenizer.
|
LetterTokenizer(AttributeFactory factory)
Construct a new LetterTokenizer using a given
AttributeFactory . |
Modifier and Type | Method and Description |
---|---|
protected boolean |
isTokenChar(int c)
Collects only characters which satisfy
Character.isLetter(int) . |
end, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, incrementToken, normalize, reset
close, correctOffset, setReader
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public LetterTokenizer()
public LetterTokenizer(AttributeFactory factory)
AttributeFactory
.factory
- the attribute factory to use for this Tokenizer
protected boolean isTokenChar(int c)
Character.isLetter(int)
.isTokenChar
in class CharTokenizer
Copyright © 2000-2017 Apache Software Foundation. All Rights Reserved.