org.apache.lucene.search.suggest.analyzing
Class FuzzySuggester

java.lang.Object
  extended by org.apache.lucene.search.suggest.Lookup
      extended by org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester
          extended by org.apache.lucene.search.suggest.analyzing.FuzzySuggester

public final class FuzzySuggester
extends AnalyzingSuggester

Implements a fuzzy AnalyzingSuggester. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false for the transpositions parameter.

At most, this query will match terms up to 2 edits. Higher distances are not supported. Note that the fuzzy distance is measured in "byte space" on the bytes returned by the TokenStream's TermToBytesRefAttribute, usually UTF8. By default the analyzed bytes must be at least 3 DEFAULT_MIN_FUZZY_LENGTH bytes before any edits are considered. Furthermore, the first 1 DEFAULT_NON_FUZZY_PREFIX byte is not allowed to be edited. We allow up to 1 (@link #DEFAULT_MAX_EDITS} edit. If unicodeAware parameter in the constructor is set to true, maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix are measured in Unicode code points (actual letters) instead of bytes.

NOTE: This suggester does not boost suggestions that required no edits over suggestions that did require edits. This is a known limitation.

Note: complex query analyzers can have a significant impact on the lookup performance. It's recommended to not use analyzers that drop or inject terms like synonyms to keep the complexity of the prefix intersection low for good lookup performance. At index time, complex analyzers can safely be used.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
 
Field Summary
static int DEFAULT_MAX_EDITS
          The default maximum number of edits for fuzzy suggestions.
static int DEFAULT_MIN_FUZZY_LENGTH
          The default minimum length of the key passed to AnalyzingSuggester.lookup(java.lang.CharSequence, boolean, int) before any edits are allowed.
static int DEFAULT_NON_FUZZY_PREFIX
          The default prefix length where edits are not allowed.
static boolean DEFAULT_TRANSPOSITIONS
          The default transposition value passed to LevenshteinAutomata
static boolean DEFAULT_UNICODE_AWARE
          Measure maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters in Unicode code points (actual letters) instead of bytes.
 
Fields inherited from class org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester
EXACT_FIRST, PRESERVE_SEP
 
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
 
Constructor Summary
FuzzySuggester(Analyzer analyzer)
          Creates a FuzzySuggester instance initialized with default values.
FuzzySuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer)
          Creates a FuzzySuggester instance with an index & a query analyzer initialized with default values.
FuzzySuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, int maxEdits, boolean transpositions, int nonFuzzyPrefix, int minFuzzyLength, boolean unicodeAware)
          Creates a FuzzySuggester instance.
 
Method Summary
protected  Automaton convertAutomaton(Automaton a)
          Used by subclass to change the lookup automaton, if necessary.
protected  List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> getFullPrefixPaths(List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> prefixPaths, Automaton lookupAutomaton, FST<PairOutputs.Pair<Long,BytesRef>> fst)
          Returns all prefix paths to initialize the search.
 
Methods inherited from class org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester
build, get, load, lookup, setPreservePositionIncrements, sizeInBytes, store
 
Methods inherited from class org.apache.lucene.search.suggest.Lookup
build
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_UNICODE_AWARE

public static final boolean DEFAULT_UNICODE_AWARE
Measure maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix parameters in Unicode code points (actual letters) instead of bytes.

See Also:
Constant Field Values

DEFAULT_MIN_FUZZY_LENGTH

public static final int DEFAULT_MIN_FUZZY_LENGTH
The default minimum length of the key passed to AnalyzingSuggester.lookup(java.lang.CharSequence, boolean, int) before any edits are allowed.

See Also:
Constant Field Values

DEFAULT_NON_FUZZY_PREFIX

public static final int DEFAULT_NON_FUZZY_PREFIX
The default prefix length where edits are not allowed.

See Also:
Constant Field Values

DEFAULT_MAX_EDITS

public static final int DEFAULT_MAX_EDITS
The default maximum number of edits for fuzzy suggestions.

See Also:
Constant Field Values

DEFAULT_TRANSPOSITIONS

public static final boolean DEFAULT_TRANSPOSITIONS
The default transposition value passed to LevenshteinAutomata

See Also:
Constant Field Values
Constructor Detail

FuzzySuggester

public FuzzySuggester(Analyzer analyzer)
Creates a FuzzySuggester instance initialized with default values.

Parameters:
analyzer - the analyzer used for this suggester

FuzzySuggester

public FuzzySuggester(Analyzer indexAnalyzer,
                      Analyzer queryAnalyzer)
Creates a FuzzySuggester instance with an index & a query analyzer initialized with default values.

Parameters:
indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.
queryAnalyzer - Analyzer that will be used for analyzing query text during lookup

FuzzySuggester

public FuzzySuggester(Analyzer indexAnalyzer,
                      Analyzer queryAnalyzer,
                      int options,
                      int maxSurfaceFormsPerAnalyzedForm,
                      int maxGraphExpansions,
                      int maxEdits,
                      boolean transpositions,
                      int nonFuzzyPrefix,
                      int minFuzzyLength,
                      boolean unicodeAware)
Creates a FuzzySuggester instance.

Parameters:
indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.
queryAnalyzer - Analyzer that will be used for analyzing query text during lookup
options - see AnalyzingSuggester.EXACT_FIRST, AnalyzingSuggester.PRESERVE_SEP
maxSurfaceFormsPerAnalyzedForm - Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.
maxGraphExpansions - Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.
maxEdits - must be >= 0 and <= LevenshteinAutomata.MAXIMUM_SUPPORTED_DISTANCE .
transpositions - true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.
nonFuzzyPrefix - length of common (non-fuzzy) prefix (see default DEFAULT_NON_FUZZY_PREFIX
minFuzzyLength - minimum length of lookup key before any edits are allowed (see default DEFAULT_MIN_FUZZY_LENGTH)
unicodeAware - operate Unicode code points instead of bytes.
Method Detail

getFullPrefixPaths

protected List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> getFullPrefixPaths(List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> prefixPaths,
                                                                                 Automaton lookupAutomaton,
                                                                                 FST<PairOutputs.Pair<Long,BytesRef>> fst)
                                                                          throws IOException
Description copied from class: AnalyzingSuggester
Returns all prefix paths to initialize the search.

Overrides:
getFullPrefixPaths in class AnalyzingSuggester
Throws:
IOException

convertAutomaton

protected Automaton convertAutomaton(Automaton a)
Description copied from class: AnalyzingSuggester
Used by subclass to change the lookup automaton, if necessary.

Overrides:
convertAutomaton in class AnalyzingSuggester


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.