public final class ICUTokenizer extends Tokenizer
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizerConfig
ICUTokenizerConfigAttributeSource.StateDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
ICUTokenizer()
Construct a new ICUTokenizer that breaks text into words from the given
Reader.
|
ICUTokenizer(AttributeFactory factory,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
ICUTokenizer(ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
| Modifier and Type | Method and Description |
|---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset() |
close, correctOffset, setReaderaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic ICUTokenizer()
The default script-specific handling is used.
The default attribute factory is used.
DefaultICUTokenizerConfigpublic ICUTokenizer(ICUTokenizerConfig config)
The default attribute factory is used.
config - Tailored BreakIterator configurationpublic ICUTokenizer(AttributeFactory factory, ICUTokenizerConfig config)
factory - AttributeFactory to useconfig - Tailored BreakIterator configurationpublic boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class TokenizerIOExceptionpublic void end()
throws IOException
end in class TokenStreamIOExceptionCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.