IndicTokenizer (Lucene 3.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.in
Class IndicTokenizer

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.Tokenizer
              org.apache.lucene.analysis.CharTokenizer
                  org.apache.lucene.analysis.in.IndicTokenizer

All Implemented Interfaces:: Closeable

public final class IndicTokenizer
extends CharTokenizer
extends CharTokenizer

Simple Tokenizer for text in Indian Languages.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.Tokenizer
`input`

Constructor Summary
`IndicTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader input)`
`IndicTokenizer(Version matchVersion, AttributeSource source, Reader input)`
`IndicTokenizer(Version matchVersion, Reader input)`

Method Summary
`protected boolean`	`isTokenChar(int c)` Returns true iff a codepoint should be included in a token.

Methods inherited from class org.apache.lucene.analysis.CharTokenizer
`end, incrementToken, isTokenChar, normalize, normalize, reset`

Methods inherited from class org.apache.lucene.analysis.Tokenizer
`close, correctOffset`

Methods inherited from class org.apache.lucene.analysis.TokenStream
`reset`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail

IndicTokenizer

public IndicTokenizer(Version matchVersion,
                      AttributeSource.AttributeFactory factory,
                      Reader input)

IndicTokenizer

public IndicTokenizer(Version matchVersion,
                      AttributeSource source,
                      Reader input)

IndicTokenizer

public IndicTokenizer(Version matchVersion,
                      Reader input)

Method Detail

isTokenChar

protected boolean isTokenChar(int c)

Description copied from class: CharTokenizer

Returns true iff a codepoint should be included in a token. This tokenizer generates as tokens adjacent sequences of codepoints which satisfy this predicate. Codepoints for which this is false are used to define token boundaries and are not included in tokens.

As of Lucene 3.1 the char based API (CharTokenizer.isTokenChar(char) and CharTokenizer.normalize(char)) has been depreciated in favor of a Unicode 4.0 compatible int based API to support codepoints instead of UTF-16 code units. Subclasses of CharTokenizer must not override the char based methods if a Version >= 3.1 is passed to the constructor.

NOTE: This method will be marked abstract in Lucene 4.0.

Overrides:: isTokenChar in class CharTokenizer

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.in Class IndicTokenizer

IndicTokenizer

IndicTokenizer

IndicTokenizer

isTokenChar

org.apache.lucene.analysis.in
Class IndicTokenizer