Class FuzzySuggester
- All Implemented Interfaces:
Accountable
AnalyzingSuggester
. The similarity measurement is based on the
Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose
classic Levenshtein by passing false
for the transpositions
parameter.
At most, this query will match terms up to 2 edits. Higher
distances are not supported. Note that the fuzzy distance is measured in "byte space" on the
bytes returned by the TokenStream
's TermToBytesRefAttribute
, usually UTF8. By
default the analyzed bytes must be at least 3 DEFAULT_MIN_FUZZY_LENGTH
bytes before any
edits are considered. Furthermore, the first 1 DEFAULT_NON_FUZZY_PREFIX
byte is not
allowed to be edited. We allow up to 1 (@link #DEFAULT_MAX_EDITS} edit. If unicodeAware
parameter in the constructor is set to true, maxEdits, minFuzzyLength, transpositions and
nonFuzzyPrefix are measured in Unicode code points (actual letters) instead of bytes.
NOTE: This suggester does not boost suggestions that required no edits over suggestions that did require edits. This is a known limitation.
Note: complex query analyzers can have a significant impact on the lookup performance. It's recommended to not use analyzers that drop or inject terms like synonyms to keep the complexity of the prefix intersection low for good lookup performance. At index time, complex analyzers can safely be used.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The default maximum number of edits for fuzzy suggestions.static final int
The default minimum length of the key passed toAnalyzingSuggester.lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
before any edits are allowed.static final int
The default prefix length where edits are not allowed.static final boolean
The default transposition value passed toLevenshteinAutomata
static final boolean
Measure maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters in Unicode code points (actual letters) instead of bytes.Fields inherited from class org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester
EXACT_FIRST, PRESERVE_SEP
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorDescriptionFuzzySuggester
(Directory tempDir, String tempFileNamePrefix, Analyzer analyzer) Creates aFuzzySuggester
instance initialized with default values.FuzzySuggester
(Directory tempDir, String tempFileNamePrefix, Analyzer indexAnalyzer, Analyzer queryAnalyzer) Creates aFuzzySuggester
instance with an index and query analyzer initialized with default values.FuzzySuggester
(Directory tempDir, String tempFileNamePrefix, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, int maxEdits, boolean transpositions, int nonFuzzyPrefix, int minFuzzyLength, boolean unicodeAware) Creates aFuzzySuggester
instance. -
Method Summary
Modifier and TypeMethodDescriptionprotected Automaton
Used by subclass to change the lookup automaton, if necessary.protected List<FSTUtil.Path<PairOutputs.Pair<Long,
BytesRef>>> getFullPrefixPaths
(List<FSTUtil.Path<PairOutputs.Pair<Long, BytesRef>>> prefixPaths, Automaton lookupAutomaton, FST<PairOutputs.Pair<Long, BytesRef>> fst) Returns all prefix paths to initialize the search.Methods inherited from class org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester
build, get, getChildResources, getCount, load, lookup, ramBytesUsed, store
-
Field Details
-
DEFAULT_UNICODE_AWARE
public static final boolean DEFAULT_UNICODE_AWAREMeasure maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters in Unicode code points (actual letters) instead of bytes.- See Also:
-
DEFAULT_MIN_FUZZY_LENGTH
public static final int DEFAULT_MIN_FUZZY_LENGTHThe default minimum length of the key passed toAnalyzingSuggester.lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
before any edits are allowed.- See Also:
-
DEFAULT_NON_FUZZY_PREFIX
public static final int DEFAULT_NON_FUZZY_PREFIXThe default prefix length where edits are not allowed.- See Also:
-
DEFAULT_MAX_EDITS
public static final int DEFAULT_MAX_EDITSThe default maximum number of edits for fuzzy suggestions.- See Also:
-
DEFAULT_TRANSPOSITIONS
public static final boolean DEFAULT_TRANSPOSITIONSThe default transposition value passed toLevenshteinAutomata
- See Also:
-
-
Constructor Details
-
FuzzySuggester
Creates aFuzzySuggester
instance initialized with default values.- Parameters:
analyzer
- the analyzer used for this suggester
-
FuzzySuggester
public FuzzySuggester(Directory tempDir, String tempFileNamePrefix, Analyzer indexAnalyzer, Analyzer queryAnalyzer) Creates aFuzzySuggester
instance with an index and query analyzer initialized with default values.- Parameters:
indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookup
-
FuzzySuggester
public FuzzySuggester(Directory tempDir, String tempFileNamePrefix, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements, int maxEdits, boolean transpositions, int nonFuzzyPrefix, int minFuzzyLength, boolean unicodeAware) Creates aFuzzySuggester
instance.- Parameters:
indexAnalyzer
- Analyzer that will be used for analyzing suggestions while building the index.queryAnalyzer
- Analyzer that will be used for analyzing query text during lookupoptions
- seeAnalyzingSuggester.EXACT_FIRST
,AnalyzingSuggester.PRESERVE_SEP
maxSurfaceFormsPerAnalyzedForm
- Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.maxGraphExpansions
- Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.preservePositionIncrements
- Whether position holes should appear in the automatonmaxEdits
- must be >= 0 and <=LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE
.transpositions
-true
if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.nonFuzzyPrefix
- length of common (non-fuzzy) prefix (see defaultDEFAULT_NON_FUZZY_PREFIX
minFuzzyLength
- minimum length of lookup key before any edits are allowed (see defaultDEFAULT_MIN_FUZZY_LENGTH
)unicodeAware
- operate Unicode code points instead of bytes.
-
-
Method Details
-
getFullPrefixPaths
protected List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> getFullPrefixPaths(List<FSTUtil.Path<PairOutputs.Pair<Long, BytesRef>>> prefixPaths, Automaton lookupAutomaton, FST<PairOutputs.Pair<Long, throws IOExceptionBytesRef>> fst) Description copied from class:AnalyzingSuggester
Returns all prefix paths to initialize the search.- Overrides:
getFullPrefixPaths
in classAnalyzingSuggester
- Throws:
IOException
-
convertAutomaton
Description copied from class:AnalyzingSuggester
Used by subclass to change the lookup automaton, if necessary.- Overrides:
convertAutomaton
in classAnalyzingSuggester
-