org.apache.lucene.search.suggest.analyzing
Class AnalyzingSuggester

java.lang.Object
  extended by org.apache.lucene.search.suggest.Lookup
      extended by org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester
Direct Known Subclasses:
FuzzySuggester

public class AnalyzingSuggester
extends Lookup

Suggester that first analyzes the surface form, adds the analyzed form to a weighted FST, and then does the same thing at lookup time. This means lookup is based on the analyzed form while suggestions are still the surface form(s).

This can result in powerful suggester functionality. For example, if you use an analyzer removing stop words, then the partial text "ghost chr..." could see the suggestion "The Ghost of Christmas Past". Note that position increments MUST NOT be preserved for this example to work, so you should call the constructor with preservePositionIncrements parameter set to false

If SynonymFilter is used to map wifi and wireless network to hotspot then the partial text "wirele..." could suggest "wifi router". Token normalization like stemmers, accent removal, etc., would allow suggestions to ignore such variations.

When two matching suggestions have the same weight, they are tie-broken by the analyzed form. If their analyzed form is the same then the order is undefined.

There are some limitations:

WARNING: This API is experimental and might change in incompatible ways in the next release.

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
 
Field Summary
static int EXACT_FIRST
          Include this flag in the options parameter to AnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean) to always return the exact match first, regardless of score.
static int PRESERVE_SEP
          Include this flag in the options parameter to AnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean) to preserve token separators when matching.
 
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
 
Constructor Summary
AnalyzingSuggester(Analyzer analyzer)
          Calls AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1, true)
AnalyzingSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer)
          Calls AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1, true)
AnalyzingSuggester(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int options, int maxSurfaceFormsPerAnalyzedForm, int maxGraphExpansions, boolean preservePositionIncrements)
          Creates a new suggester.
 
Method Summary
 void build(InputIterator iterator)
          Builds up a new internal Lookup representation based on the given InputIterator.
protected  Automaton convertAutomaton(Automaton a)
          Used by subclass to change the lookup automaton, if necessary.
 Object get(CharSequence key)
          Returns the weight associated with an input string, or null if it does not exist.
 long getCount()
          Get the number of entries the lookup was built with
protected  List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> getFullPrefixPaths(List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> prefixPaths, Automaton lookupAutomaton, FST<PairOutputs.Pair<Long,BytesRef>> fst)
          Returns all prefix paths to initialize the search.
 boolean load(DataInput input)
          Discard current lookup data and load it from a previously saved copy.
 List<Lookup.LookupResult> lookup(CharSequence key, boolean onlyMorePopular, int num)
          Look up a key and return possible completion for this key.
 long sizeInBytes()
          Returns byte size of the underlying FST.
 boolean store(DataOutput output)
          Persist the constructed lookup data to a directory.
 
Methods inherited from class org.apache.lucene.search.suggest.Lookup
build, load, store
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EXACT_FIRST

public static final int EXACT_FIRST
Include this flag in the options parameter to AnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean) to always return the exact match first, regardless of score. This has no performance impact but could result in low-quality suggestions.

See Also:
Constant Field Values

PRESERVE_SEP

public static final int PRESERVE_SEP
Include this flag in the options parameter to AnalyzingSuggester(Analyzer,Analyzer,int,int,int,boolean) to preserve token separators when matching.

See Also:
Constant Field Values
Constructor Detail

AnalyzingSuggester

public AnalyzingSuggester(Analyzer analyzer)
Calls AnalyzingSuggester(analyzer, analyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1, true)


AnalyzingSuggester

public AnalyzingSuggester(Analyzer indexAnalyzer,
                          Analyzer queryAnalyzer)
Calls AnalyzingSuggester(indexAnalyzer, queryAnalyzer, EXACT_FIRST | PRESERVE_SEP, 256, -1, true)


AnalyzingSuggester

public AnalyzingSuggester(Analyzer indexAnalyzer,
                          Analyzer queryAnalyzer,
                          int options,
                          int maxSurfaceFormsPerAnalyzedForm,
                          int maxGraphExpansions,
                          boolean preservePositionIncrements)
Creates a new suggester.

Parameters:
indexAnalyzer - Analyzer that will be used for analyzing suggestions while building the index.
queryAnalyzer - Analyzer that will be used for analyzing query text during lookup
options - see EXACT_FIRST, PRESERVE_SEP
maxSurfaceFormsPerAnalyzedForm - Maximum number of surface forms to keep for a single analyzed form. When there are too many surface forms we discard the lowest weighted ones.
maxGraphExpansions - Maximum number of graph paths to expand from the analyzed form. Set this to -1 for no limit.
preservePositionIncrements - Whether position holes should appear in the automata
Method Detail

sizeInBytes

public long sizeInBytes()
Returns byte size of the underlying FST.

Specified by:
sizeInBytes in class Lookup
Returns:
ram size of the lookup implementation in bytes

convertAutomaton

protected Automaton convertAutomaton(Automaton a)
Used by subclass to change the lookup automaton, if necessary.


build

public void build(InputIterator iterator)
           throws IOException
Description copied from class: Lookup
Builds up a new internal Lookup representation based on the given InputIterator. The implementation might re-sort the data internally.

Specified by:
build in class Lookup
Throws:
IOException

store

public boolean store(DataOutput output)
              throws IOException
Description copied from class: Lookup
Persist the constructed lookup data to a directory. Optional operation.

Specified by:
store in class Lookup
Parameters:
output - DataOutput to write the data to.
Returns:
true if successful, false if unsuccessful or not supported.
Throws:
IOException - when fatal IO error occurs.

load

public boolean load(DataInput input)
             throws IOException
Description copied from class: Lookup
Discard current lookup data and load it from a previously saved copy. Optional operation.

Specified by:
load in class Lookup
Parameters:
input - the DataInput to load the lookup data.
Returns:
true if completed successfully, false if unsuccessful or not supported.
Throws:
IOException - when fatal IO error occurs.

lookup

public List<Lookup.LookupResult> lookup(CharSequence key,
                                        boolean onlyMorePopular,
                                        int num)
Description copied from class: Lookup
Look up a key and return possible completion for this key.

Specified by:
lookup in class Lookup
Parameters:
key - lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.
onlyMorePopular - return only more popular results
num - maximum number of results to return
Returns:
a list of possible completions, with their relative weight (e.g. popularity)

getCount

public long getCount()
Description copied from class: Lookup
Get the number of entries the lookup was built with

Specified by:
getCount in class Lookup
Returns:
total number of suggester entries

getFullPrefixPaths

protected List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> getFullPrefixPaths(List<FSTUtil.Path<PairOutputs.Pair<Long,BytesRef>>> prefixPaths,
                                                                                 Automaton lookupAutomaton,
                                                                                 FST<PairOutputs.Pair<Long,BytesRef>> fst)
                                                                          throws IOException
Returns all prefix paths to initialize the search.

Throws:
IOException

get

public Object get(CharSequence key)
Returns the weight associated with an input string, or null if it does not exist.



Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.