public class ThaiTokenizer extends SegmentingTokenizerBase
BreakIterator
to tokenize Thai text.
WARNING: this tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.
AttributeSource.State
Modifier and Type | Field and Description |
---|---|
static boolean |
DBBI_AVAILABLE
True if the JRE supports a working dictionary-based breakiterator for Thai.
|
buffer, BUFFERMAX, offset
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
DEFAULT_ATTRIBUTE_FACTORY
Constructor and Description |
---|
ThaiTokenizer(AttributeFactory factory,
Reader reader)
Creates a new ThaiTokenizer, supplying the AttributeFactory
|
ThaiTokenizer(Reader reader)
Creates a new ThaiTokenizer
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
incrementWord()
Returns true if another word is available
|
protected void |
setNextSentence(int sentenceStart,
int sentenceEnd)
Provides the next input sentence for analysis
|
end, incrementToken, isSafeEnd, reset
close, correctOffset, setReader
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
public static final boolean DBBI_AVAILABLE
public ThaiTokenizer(Reader reader)
public ThaiTokenizer(AttributeFactory factory, Reader reader)
protected void setNextSentence(int sentenceStart, int sentenceEnd)
SegmentingTokenizerBase
setNextSentence
in class SegmentingTokenizerBase
protected boolean incrementWord()
SegmentingTokenizerBase
incrementWord
in class SegmentingTokenizerBase
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.