Class EdgeNGramTokenFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Unwrappable<TokenStream>

public final class EdgeNGramTokenFilter extends TokenFilter
Tokenizes the given token into n-grams of given size(s).

This TokenFilter create n-grams from the beginning edge of a input token.

As of Lucene 4.4, this filter handles correctly supplementary characters.

  • Field Details

    • DEFAULT_PRESERVE_ORIGINAL

      public static final boolean DEFAULT_PRESERVE_ORIGINAL
      See Also:
  • Constructor Details

    • EdgeNGramTokenFilter

      public EdgeNGramTokenFilter(TokenStream input, int minGram, int maxGram, boolean preserveOriginal)
      Creates an EdgeNGramTokenFilter that, for a given input term, produces all edge n-grams with lengths >= minGram and <= maxGram. Will optionally preserve the original term when its length is outside of the defined range.
      Parameters:
      input - TokenStream holding the input to be tokenized
      minGram - the minimum length of the generated n-grams
      maxGram - the maximum length of the generated n-grams
      preserveOriginal - Whether or not to keep the original term when it is outside the min/max size range.
    • EdgeNGramTokenFilter

      public EdgeNGramTokenFilter(TokenStream input, int gramSize)
      Creates an EdgeNGramTokenFilter that produces edge n-grams of the given size.
      Parameters:
      input - TokenStream holding the input to be tokenized
      gramSize - the n-gram size to generate.
  • Method Details