Analyzer for Simplified Chinese, which indexes words.


Class Summary
AnalyzerProfile Manages analysis data configuration for SmartChineseAnalyzer
CharType Internal SmartChineseAnalyzer character type constants.
SentenceTokenizer Tokenizes input text into sentences.
SmartChineseAnalyzer SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
Utility SmartChineseAnalyzer utility constants and methods
WordTokenFilter A TokenFilter that breaks sentences into words.
WordType Internal SmartChineseAnalyzer token type constants

Package Description

Analyzer for Simplified Chinese, which indexes words.
WARNING: The status of the analyzers/smartcn package is experimental. The APIs and file formats introduced here might change in the future and will not be supported anymore in such a case.
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way. Example phrase: "我是中国人"
  1. ChineseAnalyzer: 我-是-中-国-人
  2. CJKAnalyzer: 我是-是中-中国-国人
  3. SmartChineseAnalyzer: 我-是-中国-人

Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.