Class HMMChineseTokenizerFactory
- java.lang.Object
-
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.util.TokenizerFactory
-
- org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory
-
public final class HMMChineseTokenizerFactory extends TokenizerFactory
Factory forHMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
- Since:
- 4.10.0
- WARNING: This API is experimental and might change in incompatible ways in the next release.
- SPI Name (Note: This is case-insensitive. e.g., if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service):
- "hmmChinese"
-
-
Field Summary
Fields Modifier and Type Field Description static String
NAME
SPI name-
Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description HMMChineseTokenizerFactory(Map<String,String> args)
Creates a new HMMChineseTokenizerFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Tokenizer
create(AttributeFactory factory)
-
Methods inherited from class org.apache.lucene.analysis.util.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers
-
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final String NAME
SPI name- See Also:
- Constant Field Values
-
-
Method Detail
-
create
public Tokenizer create(AttributeFactory factory)
- Specified by:
create
in classTokenizerFactory
-
-