Class EdgeNGramTokenFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class EdgeNGramTokenFilter
    extends TokenFilter
    Tokenizes the given token into n-grams of given size(s).

    This TokenFilter create n-grams from the beginning edge of a input token.

    As of Lucene 4.4, this filter handles correctly supplementary characters.

    • Field Detail

      • DEFAULT_PRESERVE_ORIGINAL

        public static final boolean DEFAULT_PRESERVE_ORIGINAL
        See Also:
        Constant Field Values
    • Constructor Detail

      • EdgeNGramTokenFilter

        public EdgeNGramTokenFilter​(TokenStream input,
                                    int minGram,
                                    int maxGram,
                                    boolean preserveOriginal)
        Creates an EdgeNGramTokenFilter that, for a given input term, produces all edge n-grams with lengths >= minGram and <= maxGram. Will optionally preserve the original term when its length is outside of the defined range.
        Parameters:
        input - TokenStream holding the input to be tokenized
        minGram - the minimum length of the generated n-grams
        maxGram - the maximum length of the generated n-grams
        preserveOriginal - Whether or not to keep the original term when it is outside the min/max size range.
      • EdgeNGramTokenFilter

        public EdgeNGramTokenFilter​(TokenStream input,
                                    int gramSize)
        Creates an EdgeNGramTokenFilter that produces edge n-grams of the given size.
        Parameters:
        input - TokenStream holding the input to be tokenized
        gramSize - the n-gram size to generate.