@Deprecated public final class CJKTokenizer extends org.apache.lucene.analysis.Tokenizer
The tokens returned are every two adjacent characters with overlap match.
Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".
Additionally, the following is applied to Latin text (such as English):Constructor and Description |
---|
CJKTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
Reader in)
Deprecated.
|
CJKTokenizer(org.apache.lucene.util.AttributeSource source,
Reader in)
Deprecated.
|
CJKTokenizer(Reader in)
Deprecated.
Construct a token stream processing the given input.
|
Modifier and Type | Method and Description |
---|---|
void |
end()
Deprecated.
|
boolean |
incrementToken()
Deprecated.
Returns true for the next token in the stream, or false at EOS.
|
void |
reset()
Deprecated.
|
void |
reset(Reader reader)
Deprecated.
|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public CJKTokenizer(Reader in)
in
- I/O readerpublic CJKTokenizer(org.apache.lucene.util.AttributeSource source, Reader in)
public CJKTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)
public boolean incrementToken() throws IOException
incrementToken
in class org.apache.lucene.analysis.TokenStream
IOException
- - throw IOException when read error public final void end()
end
in class org.apache.lucene.analysis.TokenStream
public void reset() throws IOException
reset
in class org.apache.lucene.analysis.TokenStream
IOException
public void reset(Reader reader) throws IOException
reset
in class org.apache.lucene.analysis.Tokenizer
IOException