Class ICUTransformFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>

public final class ICUTransformFilter extends TokenFilter
A TokenFilter that transforms text with ICU.

ICU provides text-transformation functionality via its Transliteration API. Although script conversion is its most common use, a Transliterator can actually perform a more general class of tasks. In fact, Transliterator defines a very general API which specifies only that a segment of the input text is replaced by new text. The particulars of this conversion are determined entirely by subclasses of Transliterator.

Some useful transformations for search are built-in:

  • Conversion from Traditional to Simplified Chinese characters
  • Conversion from Hiragana to Katakana
  • Conversion from Fullwidth to Halfwidth forms.
  • Script conversions, for example Serbian Cyrillic to Latin

Example usage:

stream = new ICUTransformFilter(stream, Transliterator.getInstance("Traditional-Simplified"));

For more details, see the ICU User Guide.
  • Constructor Details

    • ICUTransformFilter

      public ICUTransformFilter(TokenStream input, com.ibm.icu.text.Transliterator transform)
      Create a new ICUTransformFilter that transforms text on the given stream.
      Parameters:
      input - TokenStream to filter.
      transform - Transliterator to transform the text.
  • Method Details