@Deprecated public final class CJKTokenizer extends Tokenizer
The tokens returned are every two adjacent characters with overlap match.
Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".
Additionally, the following is applied to Latin text (such as English):AttributeSource.State
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
DEFAULT_ATTRIBUTE_FACTORY
Constructor and Description |
---|
CJKTokenizer(AttributeFactory factory,
Reader in)
Deprecated.
|
CJKTokenizer(Reader in)
Deprecated.
Construct a token stream processing the given input.
|
Modifier and Type | Method and Description |
---|---|
void |
end()
Deprecated.
|
boolean |
incrementToken()
Deprecated.
Returns true for the next token in the stream, or false at EOS.
|
void |
reset()
Deprecated.
|
close, correctOffset, setReader
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public CJKTokenizer(Reader in)
in
- I/O readerpublic CJKTokenizer(AttributeFactory factory, Reader in)
public boolean incrementToken() throws IOException
incrementToken
in class TokenStream
IOException
- - throw IOException when read error public final void end() throws IOException
end
in class TokenStream
IOException
public void reset() throws IOException
reset
in class Tokenizer
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.