Class HMMChineseTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class HMMChineseTokenizer
    extends SegmentingTokenizerBase
    Tokenizer for Chinese or mixed Chinese-English text.

    The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.