org.apache.lucene.analysis.in
Class IndicTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.CharTokenizer
org.apache.lucene.analysis.in.IndicTokenizer
- All Implemented Interfaces:
- Closeable
public final class IndicTokenizer
- extends CharTokenizer
Simple Tokenizer for text in Indian Languages.
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Method Summary |
protected boolean |
isTokenChar(int c)
Returns true iff a codepoint should be included in a token. |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
IndicTokenizer
public IndicTokenizer(Version matchVersion,
AttributeSource.AttributeFactory factory,
Reader input)
IndicTokenizer
public IndicTokenizer(Version matchVersion,
AttributeSource source,
Reader input)
IndicTokenizer
public IndicTokenizer(Version matchVersion,
Reader input)
isTokenChar
protected boolean isTokenChar(int c)
- Description copied from class:
CharTokenizer
- Returns true iff a codepoint should be included in a token. This tokenizer
generates as tokens adjacent sequences of codepoints which satisfy this
predicate. Codepoints for which this is false are used to define token
boundaries and are not included in tokens.
As of Lucene 3.1 the char based API (CharTokenizer.isTokenChar(char)
and
CharTokenizer.normalize(char)
) has been depreciated in favor of a Unicode 4.0
compatible int based API to support codepoints instead of UTF-16 code
units. Subclasses of CharTokenizer
must not override the char based
methods if a Version
>= 3.1 is passed to the constructor.
NOTE: This method will be marked abstract in Lucene 4.0.
- Overrides:
isTokenChar
in class CharTokenizer
Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.