public final class FuzzySuggester extends AnalyzingSuggester
AnalyzingSuggester
. The similarity measurement is
based on the Damerau-Levenshtein (optimal string alignment) algorithm, though
you can explicitly choose classic Levenshtein by passing false
for the transpositions
parameter.
At most, this query will match terms up to
2
edits. Higher distances are not supported. Note that the
fuzzy distance is measured in "byte space" on the bytes
returned by the TokenStream
's TermToBytesRefAttribute
, usually UTF8. By default
the analyzed bytes must be at least 3 DEFAULT_MIN_FUZZY_LENGTH
bytes before any edits are
considered. Furthermore, the first 1 DEFAULT_NON_FUZZY_PREFIX
byte is not allowed to be
edited. We allow up to 1 (@link
#DEFAULT_MAX_EDITS} edit.
If unicodeAware
parameter in the constructor is set to true, maxEdits,
minFuzzyLength, transpositions and nonFuzzyPrefix are measured in Unicode code
points (actual letters) instead of bytes.
NOTE: This suggester does not boost suggestions that required no edits over suggestions that did require edits. This is a known limitation.
Note: complex query analyzers can have a significant impact on the lookup performance. It's recommended to not use analyzers that drop or inject terms like synonyms to keep the complexity of the prefix intersection low for good lookup performance. At index time, complex analyzers can safely be used.
Lookup.LookupPriorityQueue, Lookup.LookupResult
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_MAX_EDITS
The default maximum number of edits for fuzzy
suggestions.
|
static int |
DEFAULT_MIN_FUZZY_LENGTH
The default minimum length of the key passed to
AnalyzingSuggester.lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int) before any edits are allowed. |
static int |
DEFAULT_NON_FUZZY_PREFIX
The default prefix length where edits are not allowed.
|
static boolean |
DEFAULT_TRANSPOSITIONS
The default transposition value passed to
LevenshteinAutomata |
static boolean |
DEFAULT_UNICODE_AWARE
Measure maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix
parameters in Unicode code points (actual letters)
instead of bytes.
|
EXACT_FIRST, PRESERVE_SEP
CHARSEQUENCE_COMPARATOR
Constructor and Description |
---|
FuzzySuggester(Directory tempDir,
String tempFileNamePrefix,
Analyzer analyzer)
Creates a
FuzzySuggester instance initialized with default values. |
FuzzySuggester(Directory tempDir,
String tempFileNamePrefix,
Analyzer indexAnalyzer,
Analyzer queryAnalyzer)
Creates a
FuzzySuggester instance with an index and query analyzer initialized with default values. |
FuzzySuggester(Directory tempDir,
String tempFileNamePrefix,
Analyzer indexAnalyzer,
Analyzer queryAnalyzer,
int options,
int maxSurfaceFormsPerAnalyzedForm,
int maxGraphExpansions,
boolean preservePositionIncrements,
int maxEdits,
boolean transpositions,
int nonFuzzyPrefix,
int minFuzzyLength,
boolean unicodeAware)
Creates a
FuzzySuggester instance. |
Modifier and Type | Method and Description |
---|---|
protected Automaton |
convertAutomaton(Automaton a)
Used by subclass to change the lookup automaton, if
necessary.
|
protected List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> |
getFullPrefixPaths(List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> prefixPaths,
Automaton lookupAutomaton,
FST<PairOutputs.Pair<Long,BytesRef>> fst)
Returns all prefix paths to initialize the search.
|
build, get, getChildResources, getCount, load, lookup, ramBytesUsed, store
public static final boolean DEFAULT_UNICODE_AWARE
public static final int DEFAULT_MIN_FUZZY_LENGTH
AnalyzingSuggester.lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
before any edits are allowed.public static final int DEFAULT_NON_FUZZY_PREFIX
public static final int DEFAULT_MAX_EDITS
public static final boolean DEFAULT_TRANSPOSITIONS
LevenshteinAutomata
public FuzzySuggester(Directory tempDir, String tempFileNamePrefix, Analyzer analyzer)
FuzzySuggester
instance initialized with default values.analyzer
- the analyzer used for this suggesterpublic FuzzySuggester(Directory tempDir, String tempFileNamePrefix, Analyzer indexAnalyzer, Analyzer queryAnalyzer)
FuzzySuggester
instance with an index and query analyzer initialized with default values.indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookuppublic FuzzySuggester(Directory tempDir, String tempFileNamePrefix, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, int maxEdits, boolean transpositions, int nonFuzzyPrefix, int minFuzzyLength, boolean unicodeAware)
FuzzySuggester
instance.indexAnalyzer
- Analyzer that will be used for
analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for
analyzing query text during lookupoptions
- see AnalyzingSuggester.EXACT_FIRST
, AnalyzingSuggester.PRESERVE_SEP
maxSurfaceFormsPerAnalyzedForm
- Maximum number of
surface forms to keep for a single analyzed form.
When there are too many surface forms we discard the
lowest weighted ones.maxGraphExpansions
- Maximum number of graph paths
to expand from the analyzed form. Set this to -1 for
no limit.preservePositionIncrements
- Whether position holes should appear in the automatonmaxEdits
- must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE
.transpositions
- true
if transpositions should be treated as a primitive
edit operation. If this is false, comparisons will implement the classic
Levenshtein algorithm.nonFuzzyPrefix
- length of common (non-fuzzy) prefix (see default DEFAULT_NON_FUZZY_PREFIX
minFuzzyLength
- minimum length of lookup key before any edits are allowed (see default DEFAULT_MIN_FUZZY_LENGTH
)unicodeAware
- operate Unicode code points instead of bytes.protected List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> getFullPrefixPaths(List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> prefixPaths, Automaton lookupAutomaton, FST<PairOutputs.Pair<Long,BytesRef>> fst) throws IOException
AnalyzingSuggester
getFullPrefixPaths
in class AnalyzingSuggester
IOException
protected Automaton convertAutomaton(Automaton a)
AnalyzingSuggester
convertAutomaton
in class AnalyzingSuggester
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.