ChineseTokenizer (Lucene 4.10.2 API)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.cn.ChineseTokenizer

All Implemented Interfaces:

Closeable, AutoCloseable

Deprecated.
(3.1) Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0
```
@Deprecated
public final class ChineseTokenizer
extends Tokenizer
```
Tokenize Chinese text as individual chinese characters.
The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

For example, if the Chinese text "C1C2C3C4" is to be indexed:
- The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
- The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.
Therefore the index created by CJKTokenizer is much larger.

The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

- Nested Class Summary
  - Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
    AttributeSource.State
- Field Summary
  - Fields inherited from class org.apache.lucene.analysis.Tokenizer
    input
  - Fields inherited from class org.apache.lucene.analysis.TokenStream
    DEFAULT_TOKEN_ATTRIBUTE_FACTORY
  - Fields inherited from class org.apache.lucene.util.AttributeSource
    DEFAULT_ATTRIBUTE_FACTORY
- Constructor Summary
  
  Constructors
  Constructor and Description
  
  ChineseTokenizer(AttributeFactory factory, Reader in)
  Deprecated.
  
  ChineseTokenizer(Reader in)
  Deprecated.
- Method Summary
  
  Methods
  Modifier and Type Method and Description
  
  void end()
  Deprecated.
  
  boolean incrementToken()
  Deprecated.
  
  void reset()
  Deprecated.
  - Methods inherited from class org.apache.lucene.analysis.Tokenizer
    close, correctOffset, setReader
  - Methods inherited from class org.apache.lucene.util.AttributeSource
    addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
  - Methods inherited from class java.lang.Object
    clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail

ChineseTokenizer
```
public ChineseTokenizer(Reader in)
```
Deprecated.

ChineseTokenizer

public ChineseTokenizer(AttributeFactory factory,
                Reader in)

Deprecated.

Method Detail
- incrementToken
```
public boolean incrementToken()
                       throws IOException
```
  Deprecated.
  
  Specified by:
  
  incrementToken in class TokenStream
  
  Throws:
  
  IOException
- end
```
public final void end()
               throws IOException
```
  Deprecated.
  
  Overrides:
  
  end in class TokenStream
  
  Throws:
  
  IOException
- reset
```
public void reset()
           throws IOException
```
  Deprecated.
  
  Overrides:
  
  reset in class Tokenizer
  
  Throws:
  
  IOException

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.