Class LowerCaseTokenizer

  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.CharTokenizer
                  extended by org.apache.lucene.analysis.LetterTokenizer
                      extended by org.apache.lucene.analysis.LowerCaseTokenizer

public final class LowerCaseTokenizer
extends LetterTokenizer

LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
Field Summary
Fields inherited from class org.apache.lucene.analysis.Tokenizer
Constructor Summary
LowerCaseTokenizer(AttributeSource.AttributeFactory factory, Reader in)
          Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.
LowerCaseTokenizer(AttributeSource source, Reader in)
          Construct a new LowerCaseTokenizer using a given AttributeSource.
LowerCaseTokenizer(Reader in)
          Construct a new LowerCaseTokenizer.
Method Summary
protected  char normalize(char c)
          Converts char to lower case Character.toLowerCase(char).
Methods inherited from class org.apache.lucene.analysis.LetterTokenizer
Methods inherited from class org.apache.lucene.analysis.CharTokenizer
end, incrementToken, next, next, reset
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset
Methods inherited from class org.apache.lucene.analysis.TokenStream
getOnlyUseNewAPI, reset, setOnlyUseNewAPI
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail


public LowerCaseTokenizer(Reader in)
Construct a new LowerCaseTokenizer.


public LowerCaseTokenizer(AttributeSource source,
                          Reader in)
Construct a new LowerCaseTokenizer using a given AttributeSource.


public LowerCaseTokenizer(AttributeSource.AttributeFactory factory,
                          Reader in)
Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.

Method Detail


protected char normalize(char c)
Converts char to lower case Character.toLowerCase(char).

normalize in class CharTokenizer

Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.