LowerCaseTokenizer (Lucene 3.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis
Class LowerCaseTokenizer

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.Tokenizer
              org.apache.lucene.analysis.CharTokenizer
                  org.apache.lucene.analysis.LetterTokenizer
                      org.apache.lucene.analysis.LowerCaseTokenizer

All Implemented Interfaces:: Closeable

public final class LowerCaseTokenizer
extends LetterTokenizer
extends LetterTokenizer

LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

You must specify the required Version compatibility when creating LowerCaseTokenizer:

As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See CharTokenizer.isTokenChar(int) and CharTokenizer.normalize(int) for details.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary

Fields inherited from class org.apache.lucene.analysis.Tokenizer
`input`

Constructor Summary
`LowerCaseTokenizer(AttributeSource.AttributeFactory factory, Reader in)` Deprecated. use `LowerCaseTokenizer(Version, AttributeSource.AttributeFactory, Reader)` instead. This will be removed in Lucene 4.0.
`LowerCaseTokenizer(AttributeSource source, Reader in)` Deprecated. use `LowerCaseTokenizer(Version, AttributeSource, Reader)` instead. This will be removed in Lucene 4.0.
`LowerCaseTokenizer(Reader in)` Deprecated. use `LowerCaseTokenizer(Version, Reader)` instead. This will be removed in Lucene 4.0.
`LowerCaseTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)` Construct a new LowerCaseTokenizer using a given `AttributeSource.AttributeFactory`.
`LowerCaseTokenizer(Version matchVersion, AttributeSource source, Reader in)` Construct a new LowerCaseTokenizer using a given `AttributeSource`.
`LowerCaseTokenizer(Version matchVersion, Reader in)` Construct a new LowerCaseTokenizer.

Method Summary
`protected int`	`normalize(int c)` Converts char to lower case `Character.toLowerCase(int)`.

Methods inherited from class org.apache.lucene.analysis.LetterTokenizer
`isTokenChar`

Methods inherited from class org.apache.lucene.analysis.CharTokenizer
`end, incrementToken, isTokenChar, normalize, reset`

Methods inherited from class org.apache.lucene.analysis.Tokenizer
`close, correctOffset`

Methods inherited from class org.apache.lucene.analysis.TokenStream
`reset`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Constructor Detail