public final class CJKAnalyzer extends StopwordAnalyzerBase
Analyzer that tokenizes text with StandardTokenizer,
normalizes content with CJKWidthFilter, folds case with
LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter,
and filters stopwords with StopFilterAnalyzer.ReuseStrategy, Analyzer.TokenStreamComponents| Modifier and Type | Field and Description |
|---|---|
static String |
DEFAULT_STOPWORD_FILE
File containing default CJK stopwords.
|
stopwordsGLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY| Constructor and Description |
|---|
CJKAnalyzer()
Builds an analyzer which removes words in
getDefaultStopSet(). |
CJKAnalyzer(CharArraySet stopwords)
Builds an analyzer with the given stop words
|
| Modifier and Type | Method and Description |
|---|---|
protected Analyzer.TokenStreamComponents |
createComponents(String fieldName) |
static CharArraySet |
getDefaultStopSet()
Returns an unmodifiable instance of the default stop-words set.
|
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSetclose, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, setVersion, tokenStream, tokenStreampublic static final String DEFAULT_STOPWORD_FILE
Currently it contains some common English words that are not usually useful for searching and some double-byte interpunctions.
public CJKAnalyzer()
getDefaultStopSet().public CJKAnalyzer(CharArraySet stopwords)
stopwords - a stopword setpublic static CharArraySet getDefaultStopSet()
protected Analyzer.TokenStreamComponents createComponents(String fieldName)
createComponents in class AnalyzerCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.