org.apache.lucene.analysis.cn.smart (Lucene 9.9.2 smartcn API)

Analyzer for Simplified Chinese, which indexes words.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.
CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase： "我是中国人"

Class Summary
Class	Description
AnalyzerProfile	Manages analysis data configuration for SmartChineseAnalyzer
CharType	Internal SmartChineseAnalyzer character type constants.
HMMChineseTokenizer	Tokenizer for Chinese or mixed Chinese-English text.
HMMChineseTokenizerFactory	Factory for `HMMChineseTokenizer`
SmartChineseAnalyzer	SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.
Utility	SmartChineseAnalyzer utility constants and methods
WordType	Internal SmartChineseAnalyzer token type constants

Package org.apache.lucene.analysis.cn.smart