public final class ICUTokenizer extends Tokenizer
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the ICUTokenizerConfig
ICUTokenizerConfig
AttributeSource.AttributeFactory, AttributeSource.State
Constructor and Description |
---|
ICUTokenizer(AttributeSource.AttributeFactory factory,
Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
ICUTokenizer(Reader input)
Construct a new ICUTokenizer that breaks text into words from the given
Reader.
|
ICUTokenizer(Reader input,
ICUTokenizerConfig config)
Construct a new ICUTokenizer that breaks text into words from the given
Reader, using a tailored BreakIterator configuration.
|
Modifier and Type | Method and Description |
---|---|
void |
end() |
boolean |
incrementToken() |
void |
reset() |
close, correctOffset, setReader
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public ICUTokenizer(Reader input)
The default script-specific handling is used.
The default attribute factory is used.
input
- Reader containing text to tokenize.DefaultICUTokenizerConfig
public ICUTokenizer(Reader input, ICUTokenizerConfig config)
The default attribute factory is used.
input
- Reader containing text to tokenize.config
- Tailored BreakIterator configurationpublic ICUTokenizer(AttributeSource.AttributeFactory factory, Reader input, ICUTokenizerConfig config)
factory
- AttributeFactory to useinput
- Reader containing text to tokenize.config
- Tailored BreakIterator configurationpublic boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
public void reset() throws IOException
reset
in class Tokenizer
IOException
public void end() throws IOException
end
in class TokenStream
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.