Class FuzzyQuery


public class FuzzyQuery extends MultiTermQuery
Implements the fuzzy search query. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false to the transpositions parameter.

This query uses MultiTermQuery.TopTermsBlendedFreqScoringRewrite as default. So terms will be collected and scored according to their edit distance. Only the top terms are used for building the BooleanQuery. It is not recommended to change the rewrite mode for fuzzy queries.

At most, this query will match terms up to 2 edits. Higher distances (especially with transpositions enabled), are generally not useful and will match a significant amount of the term dictionary. If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead.

NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed. For a term to match, the edit distance between the terms must be less than the minimum length term (either the input term, or the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2 will not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".

  • Field Details

  • Constructor Details

  • Method Details

    • defaultRewriteMethod

      public static MultiTermQuery.RewriteMethod defaultRewriteMethod(int maxExpansions)
      Creates a default top-terms blended frequency scoring rewrite with the given max expansions
    • getMaxEdits

      public int getMaxEdits()
      Returns:
      the maximum number of edit distances allowed for this query to match.
    • getPrefixLength

      public int getPrefixLength()
      Returns the non-fuzzy prefix length. This is the number of characters at the start of a term that must be identical (not fuzzy) to the query term if the query is to match that term.
    • getTranspositions

      public boolean getTranspositions()
      Returns true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.
    • getAutomata

      public CompiledAutomaton getAutomata()
      Returns the compiled automata used to match terms
    • getFuzzyAutomaton

      public static CompiledAutomaton getFuzzyAutomaton(String term, int maxEdits, int prefixLength, boolean transpositions)
      Returns the CompiledAutomaton internally used by FuzzyQuery to match terms. This is a very low-level method and may no longer exist in case the implementation of fuzzy-matching changes in the future.
      Parameters:
      term - the term to search for
      maxEdits - must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE.
      prefixLength - length of common (non-fuzzy) prefix
      transpositions - true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.
      Returns:
      A CompiledAutomaton that matches terms that satisfy input parameters.
      NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
    • visit

      public void visit(QueryVisitor visitor)
      Description copied from class: Query
      Recurse through the query tree, visiting any child queries
      Specified by:
      visit in class Query
      Parameters:
      visitor - a QueryVisitor to be called by each query in the tree
    • getTermsEnum

      protected TermsEnum getTermsEnum(Terms terms, AttributeSource atts) throws IOException
      Description copied from class: MultiTermQuery
      Construct the enumeration to be used, expanding the pattern term. This method should only be called if the field exists (ie, implementations can assume the field does exist). This method should not return null (should instead return TermsEnum.EMPTY if no terms match). The TermsEnum must already be positioned to the first matching term. The given AttributeSource is passed by the MultiTermQuery.RewriteMethod to share information between segments, for example TopTermsRewrite uses it to share maximum competitive boosts
      Specified by:
      getTermsEnum in class MultiTermQuery
      Throws:
      IOException
    • getTerm

      public Term getTerm()
      Returns the pattern term.
    • toString

      public String toString(String field)
      Description copied from class: Query
      Prints a query to a string, with field assumed to be the default field and omitted.
      Specified by:
      toString in class Query
    • hashCode

      public int hashCode()
      Description copied from class: Query
      Override and implement query hash code properly in a subclass. This is required so that QueryCache works properly.
      Overrides:
      hashCode in class MultiTermQuery
      See Also:
    • equals

      public boolean equals(Object obj)
      Description copied from class: Query
      Override and implement query instance equivalence properly in a subclass. This is required so that QueryCache works properly.

      Typically a query will be equal to another only if it's an instance of the same class and its document-filtering properties are identical that other instance. Utility methods are provided for certain repetitive code.

      Overrides:
      equals in class MultiTermQuery
      See Also:
    • floatToEdits

      public static int floatToEdits(float minimumSimilarity, int termLen)
      Helper function to convert from "minimumSimilarity" fractions to raw edit distances.
      Parameters:
      minimumSimilarity - scaled similarity
      termLen - length (in unicode codepoints) of the term.
      Returns:
      equivalent number of maxEdits