public final class HMMChineseTokenizerFactory extends TokenizerFactory
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add
a WordDelimiterFilter after to remove these (with concatenate off), or use the
SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
HMMChineseTokenizerFactory(Map<String,String> args)
Creates a new HMMChineseTokenizerFactory
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
create(AttributeFactory factory,
Reader reader) |
availableTokenizers, create, forName, lookupClass, reloadTokenizers
assureMatchVersion, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitFileNames
public Tokenizer create(AttributeFactory factory, Reader reader)
create
in class TokenizerFactory
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.