LowerCaseTokenizer (Lucene 7.3.0 API)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.Tokenizer
    - - org.apache.lucene.analysis.util.CharTokenizer
      - org.apache.lucene.analysis.core.LetterTokenizer
        
        org.apache.lucene.analysis.core.LowerCaseTokenizer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public final class LowerCaseTokenizer
extends LetterTokenizer
```
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together. It divides text at non-letters and converts them to lower case. While it is functionally equivalent to the combination of LetterTokenizer and LowerCaseFilter, there is a performance advantage to doing the two tasks at once, hence this (redundant) implementation.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.util.CharTokenizer
  DEFAULT_MAX_WORD_LEN
- Fields inherited from class org.apache.lucene.analysis.Tokenizer
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY

Constructor Summary

Constructors
Constructor and Description
`LowerCaseTokenizer()` Construct a new LowerCaseTokenizer.
`LowerCaseTokenizer(AttributeFactory factory)` Construct a new LowerCaseTokenizer using a given `AttributeFactory`.
`LowerCaseTokenizer(AttributeFactory factory, int maxTokenLen)` Construct a new LowerCaseTokenizer using a given `AttributeFactory`.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method and Description

protected int normalize(int c)
Converts char to lower case Character.toLowerCase(int).
- Methods inherited from class org.apache.lucene.analysis.core.LetterTokenizer
  isTokenChar
- Methods inherited from class org.apache.lucene.analysis.util.CharTokenizer
  end, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromSeparatorCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, fromTokenCharPredicate, incrementToken, reset
- Methods inherited from class org.apache.lucene.analysis.Tokenizer
  close, correctOffset, setReader
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - LowerCaseTokenizer
```
public LowerCaseTokenizer()
```
    Construct a new LowerCaseTokenizer.
  - LowerCaseTokenizer
```
public LowerCaseTokenizer(AttributeFactory factory)
```
    Construct a new LowerCaseTokenizer using a given AttributeFactory.
    
    Parameters:
    
    factory - the attribute factory to use for this Tokenizer
  - LowerCaseTokenizer
```
public LowerCaseTokenizer(AttributeFactory factory,
                          int maxTokenLen)
```
    Construct a new LowerCaseTokenizer using a given AttributeFactory.
    
    Parameters:
    
    factory - the attribute factory to use for this Tokenizer
    
    maxTokenLen - maximum token length the tokenizer will emit. Must be greater than 0 and less than MAX_TOKEN_LENGTH_LIMIT (1024*1024)
    
    Throws:
    
    IllegalArgumentException - if maxTokenLen is invalid.
- Method Detail
  - normalize
```
protected int normalize(int c)
```
    Converts char to lower case Character.toLowerCase(int).
    
    Overrides:
    
    normalize in class CharTokenizer

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2018 Apache Software Foundation. All Rights Reserved.