org.apache.lucene.analysis.cjk
Class CJKTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.cjk.CJKTokenizer
All Implemented Interfaces:
Closeable

Deprecated. Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.

@Deprecated
public final class CJKTokenizer
extends Tokenizer

CJKTokenizer is designed for Chinese, Japanese, and Korean languages.

The tokens returned are every two adjacent characters with overlap match.

Example: "java C1C2C3C4" will be segmented to: "java" "C1C2" "C2C3" "C3C4".

Additionally, the following is applied to Latin text (such as English): For more info on Asian language (Chinese, Japanese, and Korean) text segmentation: please search google


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
CJKTokenizer(AttributeSource.AttributeFactory factory, Reader in)
          Deprecated.  
CJKTokenizer(Reader in)
          Deprecated. Construct a token stream processing the given input.
 
Method Summary
 void end()
          Deprecated.  
 boolean incrementToken()
          Deprecated. Returns true for the next token in the stream, or false at EOS.
 void reset()
          Deprecated.  
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

CJKTokenizer

public CJKTokenizer(Reader in)
Deprecated. 
Construct a token stream processing the given input.

Parameters:
in - I/O reader

CJKTokenizer

public CJKTokenizer(AttributeSource.AttributeFactory factory,
                    Reader in)
Deprecated. 
Method Detail

incrementToken

public boolean incrementToken()
                       throws IOException
Deprecated. 
Returns true for the next token in the stream, or false at EOS. See http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.UnicodeBlock.html for detail.

Specified by:
incrementToken in class TokenStream
Returns:
false for end of stream, true otherwise
Throws:
IOException - - throw IOException when read error
happened in the InputStream

end

public final void end()
               throws IOException
Deprecated. 
Overrides:
end in class TokenStream
Throws:
IOException

reset

public void reset()
           throws IOException
Deprecated. 
Overrides:
reset in class Tokenizer
Throws:
IOException


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.