@Deprecated public final class CJKTokenizer extends Tokenizer
The tokens returned are every two adjacent characters with overlap match.
Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".
Additionally, the following is applied to Latin text (such as English):AttributeSource.StateDEFAULT_TOKEN_ATTRIBUTE_FACTORYDEFAULT_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
CJKTokenizer(AttributeFactory factory,
Reader in)
Deprecated.
|
CJKTokenizer(Reader in)
Deprecated.
Construct a token stream processing the given input.
|
| Modifier and Type | Method and Description |
|---|---|
void |
end()
Deprecated.
|
boolean |
incrementToken()
Deprecated.
Returns true for the next token in the stream, or false at EOS.
|
void |
reset()
Deprecated.
|
close, correctOffset, setReaderaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringpublic CJKTokenizer(Reader in)
in - I/O readerpublic CJKTokenizer(AttributeFactory factory, Reader in)
public boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOException - - throw IOException when read error public final void end()
throws IOException
end in class TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class TokenizerIOExceptionCopyright © 2000-2015 Apache Software Foundation. All Rights Reserved.