org.apache.lucene.search
Class FuzzyQuery

java.lang.Object
  extended by org.apache.lucene.search.Query
      extended by org.apache.lucene.search.MultiTermQuery
          extended by org.apache.lucene.search.FuzzyQuery
All Implemented Interfaces:
Cloneable

public class FuzzyQuery
extends MultiTermQuery

Implements the fuzzy search query. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false to the transpositions parameter.

This query uses MultiTermQuery.TopTermsScoringBooleanQueryRewrite as default. So terms will be collected and scored according to their edit distance. Only the top terms are used for building the BooleanQuery. It is not recommended to change the rewrite mode for fuzzy queries.

At most, this query will match terms up to 2 edits. Higher distances (especially with transpositions enabled), are generally not useful and will match a significant amount of the term dictionary. If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead.

NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed. For a term to match, the edit distance between the terms must be less than the minimum length term (either the input term, or the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2 will not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.search.MultiTermQuery
MultiTermQuery.ConstantScoreAutoRewrite, MultiTermQuery.RewriteMethod, MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite, MultiTermQuery.TopTermsScoringBooleanQueryRewrite
 
Field Summary
static int defaultMaxEdits
           
static int defaultMaxExpansions
           
static float defaultMinSimilarity
          Deprecated. pass integer edit distances instead.
static int defaultPrefixLength
           
static boolean defaultTranspositions
           
 
Fields inherited from class org.apache.lucene.search.MultiTermQuery
CONSTANT_SCORE_AUTO_REWRITE_DEFAULT, CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE, CONSTANT_SCORE_FILTER_REWRITE, field, rewriteMethod, SCORING_BOOLEAN_QUERY_REWRITE
 
Constructor Summary
FuzzyQuery(Term term)
          Calls FuzzyQuery(term, defaultMaxEdits).
FuzzyQuery(Term term, int maxEdits)
          Calls FuzzyQuery(term, maxEdits, defaultPrefixLength).
FuzzyQuery(Term term, int maxEdits, int prefixLength)
          Calls FuzzyQuery(term, maxEdits, prefixLength, defaultMaxExpansions, defaultTranspositions).
FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions)
          Create a new FuzzyQuery that will match terms with an edit distance of at most maxEdits to term.
 
Method Summary
 boolean equals(Object obj)
           
static int floatToEdits(float minimumSimilarity, int termLen)
          Deprecated. pass integer edit distances instead.
 int getMaxEdits()
           
 int getPrefixLength()
          Returns the non-fuzzy prefix length.
 Term getTerm()
          Returns the pattern term.
protected  TermsEnum getTermsEnum(Terms terms, AttributeSource atts)
          Construct the enumeration to be used, expanding the pattern term.
 int hashCode()
           
 String toString(String field)
          Prints a query to a string, with field assumed to be the default field and omitted.
 
Methods inherited from class org.apache.lucene.search.MultiTermQuery
getField, getRewriteMethod, getTermsEnum, rewrite, setRewriteMethod
 
Methods inherited from class org.apache.lucene.search.Query
clone, createWeight, extractTerms, getBoost, setBoost, toString
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

defaultMaxEdits

public static final int defaultMaxEdits
See Also:
Constant Field Values

defaultPrefixLength

public static final int defaultPrefixLength
See Also:
Constant Field Values

defaultMaxExpansions

public static final int defaultMaxExpansions
See Also:
Constant Field Values

defaultTranspositions

public static final boolean defaultTranspositions
See Also:
Constant Field Values

defaultMinSimilarity

@Deprecated
public static final float defaultMinSimilarity
Deprecated. pass integer edit distances instead.
See Also:
Constant Field Values
Constructor Detail

FuzzyQuery

public FuzzyQuery(Term term,
                  int maxEdits,
                  int prefixLength,
                  int maxExpansions,
                  boolean transpositions)
Create a new FuzzyQuery that will match terms with an edit distance of at most maxEdits to term. If a prefixLength > 0 is specified, a common prefix of that length is also required.

Parameters:
term - the term to search for
maxEdits - must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE.
prefixLength - length of common (non-fuzzy) prefix
maxExpansions - the maximum number of terms to match. If this number is greater than BooleanQuery.getMaxClauseCount() when the query is rewritten, then the maxClauseCount will be used instead.
transpositions - true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.

FuzzyQuery

public FuzzyQuery(Term term,
                  int maxEdits,
                  int prefixLength)
Calls FuzzyQuery(term, maxEdits, prefixLength, defaultMaxExpansions, defaultTranspositions).


FuzzyQuery

public FuzzyQuery(Term term,
                  int maxEdits)
Calls FuzzyQuery(term, maxEdits, defaultPrefixLength).


FuzzyQuery

public FuzzyQuery(Term term)
Calls FuzzyQuery(term, defaultMaxEdits).

Method Detail

getMaxEdits

public int getMaxEdits()
Returns:
the maximum number of edit distances allowed for this query to match.

getPrefixLength

public int getPrefixLength()
Returns the non-fuzzy prefix length. This is the number of characters at the start of a term that must be identical (not fuzzy) to the query term if the query is to match that term.


getTermsEnum

protected TermsEnum getTermsEnum(Terms terms,
                                 AttributeSource atts)
                          throws IOException
Description copied from class: MultiTermQuery
Construct the enumeration to be used, expanding the pattern term. This method should only be called if the field exists (ie, implementations can assume the field does exist). This method should not return null (should instead return TermsEnum.EMPTY if no terms match). The TermsEnum must already be positioned to the first matching term. The given AttributeSource is passed by the MultiTermQuery.RewriteMethod to provide attributes, the rewrite method uses to inform about e.g. maximum competitive boosts. This is currently only used by TopTermsRewrite

Specified by:
getTermsEnum in class MultiTermQuery
Throws:
IOException

getTerm

public Term getTerm()
Returns the pattern term.


toString

public String toString(String field)
Description copied from class: Query
Prints a query to a string, with field assumed to be the default field and omitted.

Specified by:
toString in class Query

hashCode

public int hashCode()
Overrides:
hashCode in class MultiTermQuery

equals

public boolean equals(Object obj)
Overrides:
equals in class MultiTermQuery

floatToEdits

@Deprecated
public static int floatToEdits(float minimumSimilarity,
                                          int termLen)
Deprecated. pass integer edit distances instead.

Helper function to convert from deprecated "minimumSimilarity" fractions to raw edit distances.

Parameters:
minimumSimilarity - scaled similarity
termLen - length (in unicode codepoints) of the term.
Returns:
equivalent number of maxEdits


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.