org.apache.lucene.analysis.cn.smart.HMMChineseTokenizerFactory

public final class HMMChineseTokenizerFactory extends TokenizerFactory

Factory for HMMChineseTokenizer

Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via: words="org/apache/lucene/analysis/cn/smart/stopwords.txt"

Since:: 4.10.0
WARNING: This API is experimental and might change in incompatible ways in the next release.
SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).: "hmmChinese"

Field Summary

Fields

Modifier and Type

Field

Description

static final String

NAME

SPI name

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor Summary

Constructors

Constructor

Description

HMMChineseTokenizerFactory()

Default ctor for compatibility with SPI

HMMChineseTokenizerFactory(Map<String,String> args)

Creates a new HMMChineseTokenizerFactory
Method Summary

Modifier and Type

Method

Description

Tokenizer

create(AttributeFactory factory)

Methods inherited from class org.apache.lucene.analysis.TokenizerFactory
availableTokenizers, create, findSPIName, forName, lookupClass, reloadTokenizers

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory
defaultCtorException, get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- NAME
  
  public static final String NAME
  
  SPI name
  See Also:
  
  Constant Field Values
Constructor Details
- HMMChineseTokenizerFactory
  
  public HMMChineseTokenizerFactory(Map<String,String> args)
  
  Creates a new HMMChineseTokenizerFactory
- HMMChineseTokenizerFactory
  
  public HMMChineseTokenizerFactory()
  
  Default ctor for compatibility with SPI
Method Details
- create
  
  public Tokenizer create(AttributeFactory factory)
  
  Specified by:
  
  create in class TokenizerFactory

Class HMMChineseTokenizerFactory

Field Summary

Fields inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.TokenizerFactory

Methods inherited from class org.apache.lucene.analysis.AbstractAnalysisFactory

Methods inherited from class java.lang.Object

Field Details

NAME

Constructor Details

HMMChineseTokenizerFactory

HMMChineseTokenizerFactory

Method Details

create