public final class HMMChineseTokenizerFactory extends TokenizerFactory
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add
a WordDelimiterFilter after to remove these (with concatenate off), or use the
SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
HMMChineseTokenizerFactory(Map<String,String> args)
Creates a new HMMChineseTokenizerFactory
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
create(AttributeFactory factory) |
availableTokenizers, create, forName, lookupClass, reloadTokenizers
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitFileNames
public Tokenizer create(AttributeFactory factory)
create
in class TokenizerFactory
Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.