Class HMMChineseTokenizerFactory


  • public final class HMMChineseTokenizerFactory
    extends TokenizerFactory
    Factory for HMMChineseTokenizer

    Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via: words="org/apache/lucene/analysis/cn/smart/stopwords.txt"

    Since:
    4.10.0
    WARNING: This API is experimental and might change in incompatible ways in the next release.
    SPI Name (case-insensitive: if the name is 'htmlStrip', 'htmlstrip' can be used when looking up the service).
    "hmmChinese"