|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
CJKAnalyzer | An Analyzer that tokenizes text with StandardTokenizer ,
normalizes content with CJKWidthFilter , folds case with
LowerCaseFilter , forms bigrams of CJK with CJKBigramFilter ,
and filters stopwords with StopFilter |
CJKBigramFilter | Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer. |
CJKBigramFilterFactory | Factory for CJKBigramFilter . |
CJKTokenizer | Deprecated. Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead. |
CJKTokenizerFactory | Deprecated. Use CJKBigramFilterFactory instead. |
CJKWidthFilter | A TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
|
CJKWidthFilterFactory | Factory for CJKWidthFilter . |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams. This analyzer generates bigram terms, which are overlapping groups of two adjacent Han, Hiragana, Katakana, or Hangul characters.
Three analyzers are provided for Chinese, each of which treats Chinese text in a different way.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |