ChineseTokenizer (Lucene 3.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.cn
Class ChineseTokenizer

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.Tokenizer
              org.apache.lucene.analysis.cn.ChineseTokenizer

All Implemented Interfaces:: Closeable

Deprecated. Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0

@Deprecated public final class ChineseTokenizer
extends org.apache.lucene.analysis.Tokenizer
extends org.apache.lucene.analysis.Tokenizer

Tokenize Chinese text as individual chinese characters.

The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

For example, if the Chinese text "C1C2C3C4" is to be indexed:

The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.

Therefore the index created by CJKTokenizer is much larger.

The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

Version:: 1.0

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.Tokenizer
`input`

Constructor Summary
`ChineseTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)` Deprecated.
`ChineseTokenizer(org.apache.lucene.util.AttributeSource source, Reader in)` Deprecated.
`ChineseTokenizer(Reader in)` Deprecated.

Method Summary
`void`	`end()` Deprecated.
`boolean`	`incrementToken()` Deprecated.
`void`	`reset()` Deprecated.
`void`	`reset(Reader input)` Deprecated.

Methods inherited from class org.apache.lucene.analysis.Tokenizer
`close, correctOffset`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail