public final class ThaiWordFilter extends TokenFilter
TokenFilter that use BreakIterator to break each
Token that is Thai into separate Token(s) for each Thai word.
Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text.
ThaiAnalyzer will insert a LowerCaseFilter before this filter
so the behaviour of the Analyzer does not change. With version 3.1, the filter handles
position increments correctly.
WARNING: this filter may not be supported by all JREs. It is known to work with Sun/Oracle and Harmony JREs. If your application needs to be fully portable, consider using ICUTokenizer instead, which uses an ICU Thai BreakIterator that will always be available.
AttributeSource.AttributeFactory, AttributeSource.State| Modifier and Type | Field and Description |
|---|---|
static boolean |
DBBI_AVAILABLE
True if the JRE supports a working dictionary-based breakiterator for Thai.
|
input| Constructor and Description |
|---|
ThaiWordFilter(Version matchVersion,
TokenStream input)
Creates a new ThaiWordFilter with the specified match version.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
incrementToken() |
void |
reset() |
close, endaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreStatepublic static final boolean DBBI_AVAILABLE
public ThaiWordFilter(Version matchVersion, TokenStream input)
public boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOExceptionpublic void reset()
throws IOException
reset in class TokenFilterIOExceptionCopyright © 2000-2013 Apache Software Foundation. All Rights Reserved.