ChineseTokenizer (Lucene 4.0.0 API)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.cn.ChineseTokenizer

All Implemented Interfaces:

Closeable

Deprecated.
(3.1) Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0
```
@Deprecated
public final class ChineseTokenizer
extends Tokenizer
```
Tokenize Chinese text as individual chinese characters.
The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

For example, if the Chinese text "C1C2C3C4" is to be indexed:
- The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
- The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.
Therefore the index created by CJKTokenizer is much larger.

The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.AttributeFactory, AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input

Constructor Summary

Constructors
Constructor and Description
`ChineseTokenizer(AttributeSource.AttributeFactory factory, Reader in)` Deprecated.
`ChineseTokenizer(AttributeSource source, Reader in)` Deprecated.
`ChineseTokenizer(Reader in)` Deprecated.

Method Summary

Methods
Modifier and Type Method and Description

void end()
Deprecated.

boolean incrementToken()
Deprecated.

void reset()
Deprecated.
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset, setReader
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

ChineseTokenizer
```
public ChineseTokenizer(Reader in)
```
Deprecated.

ChineseTokenizer

public ChineseTokenizer(AttributeSource source,
                Reader in)

Deprecated.

ChineseTokenizer

public ChineseTokenizer(AttributeSource.AttributeFactory factory,
                Reader in)

Deprecated.

Method Detail
- incrementToken
```
public boolean incrementToken()
                       throws IOException
```
  Deprecated.
  
  Specified by:
  
  incrementToken in class TokenStream
  
  Throws:
  
  IOException
- end
```
public final void end()
```
  Deprecated.
  
  Overrides:
  
  end in class TokenStream
- reset
```
public void reset()
           throws IOException
```
  Deprecated.
  
  Overrides:
  
  reset in class TokenStream
  
  Throws:
  
  IOException

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.