org.apache.lucene.analysis.icu.ICUNormalizer2Filter

All Implemented Interfaces:: Closeable, AutoCloseable, Unwrappable<TokenStream>

public class ICUNormalizer2Filter extends TokenFilter

Normalize token text with ICU's Normalizer2

With this filter, you can normalize text in the following ways:

If you use the defaults, this filter is a simple way to standardize Unicode text in a language-independent way for search:

The case folding that it does can be seen as a replacement for LowerCaseFilter: For example, it handles cases such as the Greek sigma, so that "Μάϊος" and "ΜΆΪΟΣ" will match correctly.
The normalization will standardizes different forms of the same character in Unicode. For example, CJK full-width numbers will be standardized to their ASCII forms.
Ignorables such as Zero-Width Joiner and Variation Selectors are removed. These are typically modifier characters that affect display.

See Also:

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenFilter
input

Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor Summary

Constructors

Constructor

Description

ICUNormalizer2Filter(TokenStream input)

Create a new Normalizer2Filter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)

ICUNormalizer2Filter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)

Create a new Normalizer2Filter with the specified Normalizer2
Method Summary

Modifier and Type

Method

Description

final boolean

incrementToken()

Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset, unwrap

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Class ICUNormalizer2Filter

Nested Class Summary