Class HMMChineseTokenizer

All Implemented Interfaces:
Closeable, AutoCloseable

public class HMMChineseTokenizer extends SegmentingTokenizerBase
Tokenizer for Chinese or mixed Chinese-English text.

The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.