LowerCaseTokenizer (Lucene 4.3.0 API)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.util.CharTokenizer
      - org.apache.lucene.analysis.core.LetterTokenizer
        
        org.apache.lucene.analysis.core.LowerCaseTokenizer

All Implemented Interfaces:

Closeable
```
public final class LowerCaseTokenizer
extends LetterTokenizer
```
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

You must specify the required Version compatibility when creating LowerCaseTokenizer:
- As of 3.1, CharTokenizer uses an int based API to normalize and detect token characters. See CharTokenizer.isTokenChar(int) and CharTokenizer.normalize(int) for details.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.AttributeFactory, AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input

Constructor Summary

Constructors
Constructor and Description
`LowerCaseTokenizer(Version matchVersion, AttributeSource.AttributeFactory factory, Reader in)` Construct a new LowerCaseTokenizer using a given `AttributeSource.AttributeFactory`.
`LowerCaseTokenizer(Version matchVersion, Reader in)` Construct a new LowerCaseTokenizer.

Method Summary

Methods
Modifier and Type Method and Description

protected int normalize(int c)
Converts char to lower case Character.toLowerCase(int).
- Methods inherited from class org.apache.lucene.analysis.core.LetterTokenizer
  isTokenChar
- Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
  end, incrementToken, reset
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset, setReader
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - LowerCaseTokenizer
```
public LowerCaseTokenizer(Version matchVersion,
                  Reader in)
```
    Construct a new LowerCaseTokenizer.
    
    Parameters:
    matchVersion - Lucene version to match See above
    in - the input to split up into tokens
  - LowerCaseTokenizer
```
public LowerCaseTokenizer(Version matchVersion,
                  AttributeSource.AttributeFactory factory,
                  Reader in)
```
    Construct a new LowerCaseTokenizer using a given AttributeSource.AttributeFactory.
    
    Parameters:
    matchVersion - Lucene version to match See above
    factory - the attribute factory to use for this Tokenizer
    in - the input to split up into tokens
- Method Detail
  - normalize
```
protected int normalize(int c)
```
    Converts char to lower case Character.toLowerCase(int).
    
    Overrides:
    
    normalize in class CharTokenizer

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.