Package org.apache.lucene.analysis.cjk

Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).

See:
          Description

Class Summary
CJKAnalyzer An Analyzer that tokenizes text with CJKTokenizer and filters with StopFilter
CJKTokenizer CJKTokenizer is designed for Chinese, Japanese, and Korean languages.
 

Package org.apache.lucene.analysis.cjk Description

Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

Example phrase: "我是中国人"
  1. ChineseAnalyzer: 我-是-中-国-人
  2. CJKAnalyzer: 我是-是中-中国-国人
  3. SmartChineseAnalyzer: 我-是-中国-人



Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.