public class FuzzyQuery extends MultiTermQuery
false
to the transpositions parameter.
This query uses MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the BooleanQuery.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to 2 edits. Higher distances (especially with transpositions enabled), are generally not useful and will match a significant amount of the term dictionary. If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed. For a term to match, the edit distance between the terms must be less than the minimum length term (either the input term, or the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2 will not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".
MultiTermQuery.RewriteMethod, MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite, MultiTermQuery.TopTermsScoringBooleanQueryRewrite| Modifier and Type | Field and Description |
|---|---|
static int |
defaultMaxEdits |
static int |
defaultMaxExpansions |
static float |
defaultMinSimilarity
Deprecated.
pass integer edit distances instead.
|
static int |
defaultPrefixLength |
static boolean |
defaultTranspositions |
CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE, CONSTANT_SCORE_BOOLEAN_REWRITE, CONSTANT_SCORE_FILTER_REWRITE, CONSTANT_SCORE_REWRITE, field, rewriteMethod, SCORING_BOOLEAN_QUERY_REWRITE, SCORING_BOOLEAN_REWRITE| Constructor and Description |
|---|
FuzzyQuery(Term term)
|
FuzzyQuery(Term term,
int maxEdits)
|
FuzzyQuery(Term term,
int maxEdits,
int prefixLength)
|
FuzzyQuery(Term term,
int maxEdits,
int prefixLength,
int maxExpansions,
boolean transpositions)
Create a new FuzzyQuery that will match terms with an edit distance
of at most
maxEdits to term. |
| Modifier and Type | Method and Description |
|---|---|
boolean |
equals(Object obj) |
static int |
floatToEdits(float minimumSimilarity,
int termLen)
Deprecated.
pass integer edit distances instead.
|
int |
getMaxEdits() |
int |
getPrefixLength()
Returns the non-fuzzy prefix length.
|
Term |
getTerm()
Returns the pattern term.
|
protected TermsEnum |
getTermsEnum(Terms terms,
AttributeSource atts)
Construct the enumeration to be used, expanding the
pattern term.
|
boolean |
getTranspositions()
Returns true if transpositions should be treated as a primitive edit operation.
|
int |
hashCode() |
String |
toString(String field)
Prints a query to a string, with
field assumed to be the
default field and omitted. |
getField, getRewriteMethod, getTermsEnum, rewrite, setRewriteMethodclone, createWeight, extractTerms, getBoost, setBoost, toStringpublic static final int defaultMaxEdits
public static final int defaultPrefixLength
public static final int defaultMaxExpansions
public static final boolean defaultTranspositions
@Deprecated public static final float defaultMinSimilarity
public FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions)
maxEdits to term.
If a prefixLength > 0 is specified, a common prefix
of that length is also required.term - the term to search formaxEdits - must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE.prefixLength - length of common (non-fuzzy) prefixmaxExpansions - the maximum number of terms to match. If this number is
greater than BooleanQuery.getMaxClauseCount() when the query is rewritten,
then the maxClauseCount will be used instead.transpositions - true if transpositions should be treated as a primitive
edit operation. If this is false, comparisons will implement the classic
Levenshtein algorithm.public FuzzyQuery(Term term, int maxEdits, int prefixLength)
public FuzzyQuery(Term term, int maxEdits)
public FuzzyQuery(Term term)
public int getMaxEdits()
public int getPrefixLength()
public boolean getTranspositions()
protected TermsEnum getTermsEnum(Terms terms, AttributeSource atts) throws IOException
MultiTermQueryTermsEnum.EMPTY if no
terms match). The TermsEnum must already be
positioned to the first matching term.
The given AttributeSource is passed by the MultiTermQuery.RewriteMethod to
provide attributes, the rewrite method uses to inform about e.g. maximum competitive boosts.
This is currently only used by TopTermsRewritegetTermsEnum in class MultiTermQueryIOExceptionpublic Term getTerm()
public String toString(String field)
Queryfield assumed to be the
default field and omitted.public int hashCode()
hashCode in class MultiTermQuerypublic boolean equals(Object obj)
equals in class MultiTermQuery@Deprecated public static int floatToEdits(float minimumSimilarity, int termLen)
minimumSimilarity - scaled similaritytermLen - length (in unicode codepoints) of the term.Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.