org.apache.lucene.analysis.th
Class ThaiWordFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.th.ThaiWordFilter
- All Implemented Interfaces:
- Closeable
public final class ThaiWordFilter
- extends org.apache.lucene.analysis.TokenFilter
TokenFilter
that use BreakIterator
to break each
Token that is Thai into separate Token(s) for each Thai word.
Please note: Since matchVersion 3.1 on, this filter no longer lowercases non-thai text.
ThaiAnalyzer
will insert a LowerCaseFilter
before this filter
so the behaviour of the Analyzer does not change. With version 3.1, the filter handles
position increments correctly.
WARNING: this filter may not be supported by all JREs.
It is known to work with Sun/Oracle and Harmony JREs.
If your application needs to be fully portable, consider using ICUTokenizer instead,
which uses an ICU Thai BreakIterator that will always be available.
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State |
Field Summary |
static boolean |
DBBI_AVAILABLE
True if the JRE supports a working dictionary-based breakiterator for Thai. |
Fields inherited from class org.apache.lucene.analysis.TokenFilter |
input |
Constructor Summary |
ThaiWordFilter(org.apache.lucene.analysis.TokenStream input)
Deprecated. Use the ctor with matchVersion instead! |
ThaiWordFilter(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input)
Creates a new ThaiWordFilter with the specified match version. |
Methods inherited from class org.apache.lucene.analysis.TokenFilter |
close, end |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
DBBI_AVAILABLE
public static final boolean DBBI_AVAILABLE
- True if the JRE supports a working dictionary-based breakiterator for Thai.
If this is false, this filter will not work at all!
ThaiWordFilter
@Deprecated
public ThaiWordFilter(org.apache.lucene.analysis.TokenStream input)
- Deprecated. Use the ctor with
matchVersion
instead!
- Creates a new ThaiWordFilter that also lowercases non-thai text.
ThaiWordFilter
public ThaiWordFilter(org.apache.lucene.util.Version matchVersion,
org.apache.lucene.analysis.TokenStream input)
- Creates a new ThaiWordFilter with the specified match version.
incrementToken
public boolean incrementToken()
throws IOException
- Specified by:
incrementToken
in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
reset
public void reset()
throws IOException
- Overrides:
reset
in class org.apache.lucene.analysis.TokenFilter
- Throws:
IOException
Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.