See: Description
Class | Description |
---|---|
CJKAnalyzer |
An
Analyzer that tokenizes text with StandardTokenizer ,
normalizes content with CJKWidthFilter , folds case with
LowerCaseFilter , forms bigrams of CJK with CJKBigramFilter ,
and filters stopwords with StopFilter |
CJKBigramFilter |
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
CJKTokenizer | Deprecated
Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.
|
CJKWidthFilter |
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
NOTE: this filter can be viewed as a (practical) subset of NFKC/NFKD
Unicode normalization. |
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.