Class ICUFoldingFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class ICUFoldingFilter
    extends ICUNormalizer2Filter
    A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

    This filter applies the following foldings from the report to unicode text:

    • Accent removal
    • Case folding
    • Canonical duplicates folding
    • Dashes folding
    • Diacritic removal (including stroke, hook, descender)
    • Greek letterforms folding
    • Han Radical folding
    • Hebrew Alternates folding
    • Jamo folding
    • Letterforms folding
    • Math symbol folding
    • Multigraph Expansions: All
    • Native digit folding
    • No-break folding
    • Overline folding
    • Positional forms folding
    • Small forms folding
    • Space folding
    • Spacing Accents folding
    • Subscript folding
    • Superscript folding
    • Suzhou Numeral folding
    • Symbol folding
    • Underline folding
    • Vertical forms folding
    • Width folding

    Additionally, Default Ignorables are removed, and text is normalized to NFKC. All foldings, case folding, and normalization mappings are applied recursively to ensure a fully folded and normalized result.

    A normalizer with additional settings such as a filter that lists characters not to be normalized can be passed in the constructor.

    • Field Detail

      • NORMALIZER

        public static final com.ibm.icu.text.Normalizer2 NORMALIZER
        A normalizer for search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.
    • Constructor Detail

      • ICUFoldingFilter

        public ICUFoldingFilter​(TokenStream input)
        Create a new ICUFoldingFilter on the specified input
      • ICUFoldingFilter

        public ICUFoldingFilter​(TokenStream input,
                                com.ibm.icu.text.Normalizer2 normalizer)
        Create a new ICUFoldingFilter on the specified input with the specified normalizer