Class HMMChineseTokenizerFactory
java.lang.Object
org.apache.lucene.analysis.AbstractAnalysisFactory
org.apache.lucene.analysis.TokenizerFactory
org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
Factory for
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add a
WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese
stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
- Since:
- 4.10.0
- WARNING: This API is experimental and might change in incompatible ways in the next release.
- SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
- "hmmChinese"
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
Constructor Summary
ConstructorDescriptionDefault ctor for compatibility with SPICreates a new HMMChineseTokenizerFactory -
Method Summary
Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
Field Details
-
NAME
SPI name- See Also:
-
-
Constructor Details
-
HMMChineseTokenizerFactory
Creates a new HMMChineseTokenizerFactory -
HMMChineseTokenizerFactory
public HMMChineseTokenizerFactory()Default ctor for compatibility with SPI
-
-
Method Details
-
create
- Specified by:
create
in classTokenizerFactory
-