Class JapaneseKatakanaStemFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>

public final class JapaneseKatakanaStemFilter extends TokenFilter
A TokenFilter that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC). Only katakana words longer than a minimum length are stemmed (default is four).

Note that only full-width katakana characters are supported. Please use a CJKWidthFilter to convert half-width katakana to full-width before using this filter.

In order to prevent terms from being stemmed, use an instance of SetKeywordMarkerFilter or a custom TokenFilter that sets the KeywordAttribute before this TokenStream.

  • Field Details

    • DEFAULT_MINIMUM_LENGTH

      public static final int DEFAULT_MINIMUM_LENGTH
      See Also:
  • Constructor Details

    • JapaneseKatakanaStemFilter

      public JapaneseKatakanaStemFilter(TokenStream input, int minimumLength)
    • JapaneseKatakanaStemFilter

      public JapaneseKatakanaStemFilter(TokenStream input)
  • Method Details