org.apache.lucene.analysis.cn.smart (Lucene 4.2.1 API)

Package org.apache.lucene.analysis.cn.smart

Class Summary
AnalyzerProfile	Manages analysis data configuration for SmartChineseAnalyzer
CharType	Internal SmartChineseAnalyzer character type constants.
SentenceTokenizer	Tokenizes input text into sentences.
SmartChineseAnalyzer	SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
SmartChineseSentenceTokenizerFactory	Factory for the SmartChineseAnalyzer `SentenceTokenizer`
SmartChineseWordTokenFilterFactory	Factory for the SmartChineseAnalyzer `WordTokenFilter`
Utility	SmartChineseAnalyzer utility constants and methods
WordTokenFilter	A `TokenFilter` that breaks sentences into words.
WordType	Internal SmartChineseAnalyzer token type constants

Class Summary

AnalyzerProfile

Manages analysis data configuration for SmartChineseAnalyzer

CharType

Internal SmartChineseAnalyzer character type constants.

SentenceTokenizer

Tokenizes input text into sentences.

SmartChineseAnalyzer

SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.

SmartChineseSentenceTokenizerFactory

Factory for the SmartChineseAnalyzer SentenceTokenizer

SmartChineseWordTokenFilterFactory

Factory for the SmartChineseAnalyzer WordTokenFilter

Utility

SmartChineseAnalyzer utility constants and methods

WordTokenFilter

A TokenFilter that breaks sentences into words.

WordType

Internal SmartChineseAnalyzer token type constants

Package org.apache.lucene.analysis.cn.smart Description

Analyzer for Simplified Chinese, which indexes words.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.
CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase： "我是中国人"