Class ICUFoldingFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>

public final class ICUFoldingFilter extends ICUNormalizer2Filter
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

This filter applies the following foldings from the report to unicode text:

  • Accent removal
  • Case folding
  • Canonical duplicates folding
  • Dashes folding
  • Diacritic removal (including stroke, hook, descender)
  • Greek letterforms folding
  • Han Radical folding
  • Hebrew Alternates folding
  • Jamo folding
  • Letterforms folding
  • Math symbol folding
  • Multigraph Expansions: All
  • Native digit folding
  • No-break folding
  • Overline folding
  • Positional forms folding
  • Small forms folding
  • Space folding
  • Spacing Accents folding
  • Subscript folding
  • Superscript folding
  • Suzhou Numeral folding
  • Symbol folding
  • Underline folding
  • Vertical forms folding
  • Width folding

Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.

A normalizer with additional settings such as a filter that lists characters not to be normalized can be passed in the constructor.

  • Field Details

    • NORMALIZER

      public static final com.ibm.icu.text.Normalizer2 NORMALIZER
      A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.
  • Constructor Details

    • ICUFoldingFilter

      public ICUFoldingFilter(TokenStream input)
      Create a new ICUFoldingFilter on the specified input
    • ICUFoldingFilter

      public ICUFoldingFilter(TokenStream input, com.ibm.icu.text.Normalizer2 normalizer)
      Create a new ICUFoldingFilter on the specified input with the specified normalizer