public class ThaiTokenizer extends SegmentingTokenizerBase
BreakIterator to tokenize Thai text.
WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.
AttributeSource.State| Modifier and Type | Field and Description |
|---|---|
static boolean |
DBBI_AVAILABLE
True if the JRE supports a working dictionary-based breakiterator for Thai.
|
buffer, BUFFERMAX, offsetDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
ThaiTokenizer()
Creates a new ThaiTokenizer
|
ThaiTokenizer(AttributeFactory factory)
Creates a new ThaiTokenizer, supplying the AttributeFactory
|
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
incrementWord()
Returns true if another word is available
|
protected void |
setNextSentence(int sentenceStart,
int sentenceEnd)
Provides the next input sentence for analysis
|
end, incrementToken, isSafeEnd, resetclose, correctOffset, setReaderaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic static final boolean DBBI_AVAILABLE
public ThaiTokenizer()
public ThaiTokenizer(AttributeFactory factory)
protected void setNextSentence(int sentenceStart,
int sentenceEnd)
SegmentingTokenizerBasesetNextSentence in class SegmentingTokenizerBaseprotected boolean incrementWord()
SegmentingTokenizerBaseincrementWord in class SegmentingTokenizerBaseCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.