public class FuzzyQuery extends MultiTermQuery
false
to the transpositions
parameter.
This query uses MultiTermQuery.TopTermsScoringBooleanQueryRewrite
as default. So terms will be collected and scored according to their
edit distance. Only the top terms are used for building the BooleanQuery
.
It is not recommended to change the rewrite mode for fuzzy queries.
At most, this query will match terms up to 2 edits. Higher distances (especially with transpositions enabled), are generally not useful and will match a significant amount of the term dictionary. If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead.
NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed. For a term to match, the edit distance between the terms must be less than the minimum length term (either the input term, or the candidate term). For example, FuzzyQuery on term "abcd" with maxEdits=2 will not match an indexed term "ab", and FuzzyQuery on term "a" with maxEdits=2 will not match an indexed term "abc".
MultiTermQuery.RewriteMethod, MultiTermQuery.TopTermsBlendedFreqScoringRewrite, MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite, MultiTermQuery.TopTermsScoringBooleanQueryRewrite
Modifier and Type | Field and Description |
---|---|
static int |
defaultMaxEdits |
static int |
defaultMaxExpansions |
static float |
defaultMinSimilarity
Deprecated.
pass integer edit distances instead.
|
static int |
defaultPrefixLength |
static boolean |
defaultTranspositions |
CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE, CONSTANT_SCORE_BOOLEAN_REWRITE, CONSTANT_SCORE_FILTER_REWRITE, CONSTANT_SCORE_REWRITE, field, rewriteMethod, SCORING_BOOLEAN_QUERY_REWRITE, SCORING_BOOLEAN_REWRITE
Constructor and Description |
---|
FuzzyQuery(Term term)
|
FuzzyQuery(Term term,
int maxEdits)
|
FuzzyQuery(Term term,
int maxEdits,
int prefixLength)
|
FuzzyQuery(Term term,
int maxEdits,
int prefixLength,
int maxExpansions,
boolean transpositions)
Create a new FuzzyQuery that will match terms with an edit distance
of at most
maxEdits to term . |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object obj) |
static int |
floatToEdits(float minimumSimilarity,
int termLen)
Deprecated.
pass integer edit distances instead.
|
int |
getMaxEdits() |
int |
getPrefixLength()
Returns the non-fuzzy prefix length.
|
Term |
getTerm()
Returns the pattern term.
|
protected TermsEnum |
getTermsEnum(Terms terms,
AttributeSource atts)
Construct the enumeration to be used, expanding the
pattern term.
|
boolean |
getTranspositions()
Returns true if transpositions should be treated as a primitive edit operation.
|
int |
hashCode() |
String |
toString(String field)
Prints a query to a string, with
field assumed to be the
default field and omitted. |
getField, getRewriteMethod, getTermsEnum, rewrite, setRewriteMethod
public static final int defaultMaxEdits
public static final int defaultPrefixLength
public static final int defaultMaxExpansions
public static final boolean defaultTranspositions
@Deprecated public static final float defaultMinSimilarity
public FuzzyQuery(Term term, int maxEdits, int prefixLength, int maxExpansions, boolean transpositions)
maxEdits
to term
.
If a prefixLength
> 0 is specified, a common prefix
of that length is also required.term
- the term to search formaxEdits
- must be >= 0
and <=
LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE
.prefixLength
- length of common (non-fuzzy) prefixmaxExpansions
- the maximum number of terms to match. If this number is
greater than BooleanQuery.getMaxClauseCount()
when the query is rewritten,
then the maxClauseCount will be used instead.transpositions
- true if transpositions should be treated as a primitive
edit operation. If this is false, comparisons will implement the classic
Levenshtein algorithm.public FuzzyQuery(Term term, int maxEdits, int prefixLength)
public FuzzyQuery(Term term, int maxEdits)
public FuzzyQuery(Term term)
public int getMaxEdits()
public int getPrefixLength()
public boolean getTranspositions()
protected TermsEnum getTermsEnum(Terms terms, AttributeSource atts) throws IOException
MultiTermQuery
TermsEnum.EMPTY
if no
terms match). The TermsEnum must already be
positioned to the first matching term.
The given AttributeSource
is passed by the MultiTermQuery.RewriteMethod
to
provide attributes, the rewrite method uses to inform about e.g. maximum competitive boosts.
This is currently only used by TopTermsRewrite
getTermsEnum
in class MultiTermQuery
IOException
public Term getTerm()
public String toString(String field)
Query
field
assumed to be the
default field and omitted.public int hashCode()
hashCode
in class MultiTermQuery
public boolean equals(Object obj)
equals
in class MultiTermQuery
@Deprecated public static int floatToEdits(float minimumSimilarity, int termLen)
minimumSimilarity
- scaled similaritytermLen
- length (in unicode codepoints) of the term.Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.