org.apache.lucene.analysis.cn.smart (Lucene 9.1.0 smartcn API)

package org.apache.lucene.analysis.cn.smart

Analyzer for Simplified Chinese, which indexes words.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.

StandardAnalyzer: Index unigrams (individual Chinese characters) as a token.
CJKAnalyzer (in the analyzers/cjk package): Index bigrams (overlapping groups of two adjacent Chinese characters) as tokens.
SmartChineseAnalyzer (in this package): Index words (attempt to segment Chinese text into words) as tokens.

Example phrase： "我是中国人"

Package org.apache.lucene.analysis.cn.smart