Class JapaneseKatakanaStemFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class JapaneseKatakanaStemFilter
    extends TokenFilter
    A TokenFilter that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC). Only katakana words longer than a minimum length are stemmed (default is four).

    Note that only full-width katakana characters are supported. Please use a CJKWidthFilter to convert half-width katakana to full-width before using this filter.

    In order to prevent terms from being stemmed, use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.