public final class ICUTokenizer extends Tokenizer
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizerConfig
ICUTokenizerConfigAttributeSource.AttributeFactory, AttributeSource.State| Constructor and Description |
|---|
ICUTokenizer(AttributeSource.AttributeFactory factory,
Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
ICUTokenizer(Reader input)
Construct a new ICUTokenizer that breaks text into words from the given
Reader.
|
ICUTokenizer(Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
| Modifier and Type | Method and Description |
|---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset() |
close, correctOffset, setReaderaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringpublic ICUTokenizer(Reader input)
The default script-specific handling is used.
The default attribute factory is used.
input - Reader containing text to tokenize.DefaultICUTokenizerConfigpublic ICUTokenizer(Reader input, ICUTokenizerConfig config)
The default attribute factory is used.
input - Reader containing text to tokenize.config - Tailored BreakIterator configurationpublic ICUTokenizer(AttributeSource.AttributeFactory factory, Reader input, ICUTokenizerConfig config)
factory - AttributeFactory to useinput - Reader containing text to tokenize.config - Tailored BreakIterator configurationpublic boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class TokenizerIOExceptionpublic void end()
throws IOException
end in class TokenStreamIOExceptionCopyright © 2000-2013 Apache Software Foundation. All Rights Reserved.