-
Class Summary
Class |
Description |
ChineseAnalyzer |
Deprecated
Use StandardAnalyzer instead, which has the same functionality.
|
ChineseFilter |
Deprecated
Use StopFilter instead, which has the same functionality.
|
ChineseTokenizer |
Deprecated
Use StandardTokenizer instead, which has the same functionality.
|
Package org.apache.lucene.analysis.cn Description
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.
- StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.
- CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
- SmartChineseAnalyzer (in the analyzers/smartcn package): Index words (attempt to segment Chinese text into words) as tokens.
Example phrase: "我是中国人"
- StandardAnalyzer: 我-是-中-国-人
- CJKAnalyzer: 我是-是中-中国-国人
- SmartChineseAnalyzer: 我-是-中国-人