Package org.apache.lucene.analysis.cjk

Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.

See: Description

Package org.apache.lucene.analysis.cjk Description

Analyzer for Chinese, Japanese, and Korean, which indexes bigrams. This analyzer generates bigram terms, which are overlapping groups of two adjacent Han, Hiragana, Katakana, or Hangul characters.

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

Example phrase: "我是中国人"
  1. ChineseAnalyzer: 我-是-中-国-人
  2. CJKAnalyzer: 我是-是中-中国-国人
  3. SmartChineseAnalyzer: 我-是-中国-人

Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.