Class HMMChineseTokenizerFactory


public final class HMMChineseTokenizerFactory extends TokenizerFactory
Factory for HMMChineseTokenizer

Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via: words="org/apache/lucene/analysis/cn/smart/stopwords.txt"

Since:
4.10.0
WARNING: This API is experimental and might change in incompatible ways in the next release.
SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
"hmmChinese"
  • Field Details

  • Constructor Details

    • HMMChineseTokenizerFactory

      public HMMChineseTokenizerFactory(Map<String,String> args)
      Creates a new HMMChineseTokenizerFactory
    • HMMChineseTokenizerFactory

      public HMMChineseTokenizerFactory()
      Default ctor for compatibility with SPI
  • Method Details