Class CompoundWordTokenFilterBase

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>
Direct Known Subclasses:
DictionaryCompoundWordTokenFilter, HyphenationCompoundWordTokenFilter

public abstract class CompoundWordTokenFilterBase extends TokenFilter
Base class for decomposition token filters.
  • Field Details

    • DEFAULT_MIN_WORD_SIZE

      public static final int DEFAULT_MIN_WORD_SIZE
      The default for minimal word length that gets decomposed
      See Also:
    • DEFAULT_MIN_SUBWORD_SIZE

      public static final int DEFAULT_MIN_SUBWORD_SIZE
      The default for minimal length of subwords that get propagated to the output of this filter
      See Also:
    • DEFAULT_MAX_SUBWORD_SIZE

      public static final int DEFAULT_MAX_SUBWORD_SIZE
      The default for maximal length of subwords that get propagated to the output of this filter
      See Also:
    • dictionary

      protected final CharArraySet dictionary
    • tokens

    • minWordSize

      protected final int minWordSize
    • minSubwordSize

      protected final int minSubwordSize
    • maxSubwordSize

      protected final int maxSubwordSize
    • onlyLongestMatch

      protected final boolean onlyLongestMatch
    • termAtt

      protected final CharTermAttribute termAtt
    • offsetAtt

      protected final OffsetAttribute offsetAtt
  • Constructor Details

    • CompoundWordTokenFilterBase

      protected CompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary, boolean onlyLongestMatch)
    • CompoundWordTokenFilterBase

      protected CompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary)
    • CompoundWordTokenFilterBase

      protected CompoundWordTokenFilterBase(TokenStream input, CharArraySet dictionary, int minWordSize, int minSubwordSize, int maxSubwordSize, boolean onlyLongestMatch)
  • Method Details