public final class WhitespaceTokenizer extends CharTokenizer
Character.isWhitespace(int)
. Note: That definition explicitly excludes the non-breaking space.
Adjacent sequences of non-Whitespace characters form tokens.UnicodeWhitespaceTokenizer
AttributeSource.State
DEFAULT_MAX_WORD_LEN
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor and Description |
---|
WhitespaceTokenizer()
Construct a new WhitespaceTokenizer.
|
WhitespaceTokenizer(AttributeFactory factory)
Construct a new WhitespaceTokenizer using a given
AttributeFactory . |
WhitespaceTokenizer(AttributeFactory factory,
int maxTokenLen)
Construct a new WhitespaceTokenizer using a given
AttributeFactory . |
WhitespaceTokenizer(int maxTokenLen)
Construct a new WhitespaceTokenizer using a given max token length
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
isTokenChar(int c)
Collects only characters which do not satisfy
Character.isWhitespace(int) . |
end, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, incrementToken, reset
close, correctOffset, setReader
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public WhitespaceTokenizer()
public WhitespaceTokenizer(AttributeFactory factory)
AttributeFactory
.factory
- the attribute factory to use for this Tokenizer
public WhitespaceTokenizer(int maxTokenLen)
maxTokenLen
- maximum token length the tokenizer will emit.
Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)IllegalArgumentException
- if maxTokenLen is invalid.public WhitespaceTokenizer(AttributeFactory factory, int maxTokenLen)
AttributeFactory
.factory
- the attribute factory to use for this Tokenizer
maxTokenLen
- maximum token length the tokenizer will emit.
Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)IllegalArgumentException
- if maxTokenLen is invalid.protected boolean isTokenChar(int c)
Character.isWhitespace(int)
.isTokenChar
in class CharTokenizer
Copyright © 2000-2020 Apache Software Foundation. All Rights Reserved.