ICUNormalizer2Filter (Lucene 8.3.0 API)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.TokenFilter
    - - org.apache.lucene.analysis.icu.ICUNormalizer2Filter

All Implemented Interfaces:

Closeable, AutoCloseable

Direct Known Subclasses:

ICUFoldingFilter
```
public class ICUNormalizer2Filter
extends TokenFilter
```
Normalize token text with ICU's Normalizer2
With this filter, you can normalize text in the following ways:
- NFKC Normalization, Case Folding, and removing Ignorables (the default)
- Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
- Based on rules from a custom normalization mapping.
If you use the defaults, this filter is a simple way to standardize Unicode text in a language-independent way for search:
- The case folding that it does can be seen as a replacement for LowerCaseFilter: For example, it handles cases such as the Greek sigma, so that "Μάϊος" and "ΜΆΪΟΣ" will match correctly.
- The normalization will standardizes different forms of the same character in Unicode. For example, CJK full-width numbers will be standardized to their ASCII forms.
- Ignorables such as Zero-Width Joiner and Variation Selectors are removed. These are typically modifier characters that affect display.
See Also:

Normalizer2, FilteredNormalizer2

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.TokenFilter
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY

Constructor Summary

Constructors
Constructor and Description
`ICUNormalizer2Filter(TokenStream input)` Create a new Normalizer2Filter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)
`ICUNormalizer2Filter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)` Create a new Normalizer2Filter with the specified Normalizer2

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type Method and Description

boolean incrementToken()
- Methods inherited from class org.apache.lucene.analysis.TokenFilter
  close, end, reset
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - ICUNormalizer2Filter
```
public ICUNormalizer2Filter(TokenStream input)
```
    Create a new Normalizer2Filter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)
  - ICUNormalizer2Filter
```
public ICUNormalizer2Filter(TokenStream input,
                            com.ibm.icu.text.Normalizer2 normalizer)
```
    Create a new Normalizer2Filter with the specified Normalizer2
    
    Parameters:
    
    input - stream
    
    normalizer - normalizer to use
- Method Detail
  - incrementToken
```
public final boolean incrementToken()
                             throws IOException
```
    Specified by:
    
    incrementToken in class TokenStream
    
    Throws:
    
    IOException

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.