public class HMMChineseTokenizer extends SegmentingTokenizerBase
The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.
AttributeSource.Statebuffer, BUFFERMAX, offsetDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
HMMChineseTokenizer()
Creates a new HMMChineseTokenizer
|
HMMChineseTokenizer(AttributeFactory factory)
Creates a new HMMChineseTokenizer, supplying the AttributeFactory
|
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
incrementWord() |
void |
reset() |
protected void |
setNextSentence(int sentenceStart,
int sentenceEnd) |
end, incrementToken, isSafeEndclose, correctOffset, setReaderaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic HMMChineseTokenizer()
public HMMChineseTokenizer(AttributeFactory factory)
protected void setNextSentence(int sentenceStart,
int sentenceEnd)
setNextSentence in class SegmentingTokenizerBaseprotected boolean incrementWord()
incrementWord in class SegmentingTokenizerBasepublic void reset()
throws IOException
reset in class SegmentingTokenizerBaseIOExceptionCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.