org.apache.lucene.analysis.icu.ICUFoldingFilter

All Implemented Interfaces:: Closeable, AutoCloseable, Unwrappable<TokenStream>

public final class ICUFoldingFilter extends ICUNormalizer2Filter

A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

This filter applies the following foldings from the report to unicode text:

Accent removal
Case folding
Canonical duplicates folding
Dashes folding
Diacritic removal (including stroke, hook, descender)
Greek letterforms folding
Han Radical folding
Hebrew Alternates folding
Jamo folding
Letterforms folding
Math symbol folding
Multigraph Expansions: All
Native digit folding
No-break folding
Overline folding
Positional forms folding
Small forms folding
Space folding
Spacing Accents folding
Subscript folding
Superscript folding
Suzhou Numeral folding
Symbol folding
Underline folding
Vertical forms folding
Width folding

Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.

A normalizer with additional settings such as a filter that lists characters not to be normalized can be passed in the constructor.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
Field Summary

Fields

Modifier and Type

Field

Description

static final com.ibm.icu.text.Normalizer2

NORMALIZER

A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

Fields inherited from class org.apache.lucene.analysis.TokenFilter
input

Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
Constructor Summary

Constructors

Constructor

Description

ICUFoldingFilter(TokenStream input)

Create a new ICUFoldingFilter on the specified input

ICUFoldingFilter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)

Create a new ICUFoldingFilter on the specified input with the specified normalizer
Method Summary

Methods inherited from class org.apache.lucene.analysis.icu.ICUNormalizer2Filter
incrementToken

Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end, reset, unwrap

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Field Details
- NORMALIZER
  
  public static final com.ibm.icu.text.Normalizer2 NORMALIZER
  
  A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.
Constructor Details
- ICUFoldingFilter
  
  public ICUFoldingFilter(TokenStream input)
  
  Create a new ICUFoldingFilter on the specified input
- ICUFoldingFilter
  
  public ICUFoldingFilter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
  
  Create a new ICUFoldingFilter on the specified input with the specified normalizer

Class ICUFoldingFilter

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenFilter

Fields inherited from class org.apache.lucene.analysis.TokenStream

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.icu.ICUNormalizer2Filter

Methods inherited from class org.apache.lucene.analysis.TokenFilter

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Field Details

NORMALIZER

Constructor Details

ICUFoldingFilter

ICUFoldingFilter