Class FreeTextSuggester
- All Implemented Interfaces:
Accountable
build(org.apache.lucene.search.suggest.InputIterator)
and predicts based on the last grams-1
tokens in the request sent to lookup(java.lang.CharSequence, boolean, int)
. This tries to handle the "long tail" of
suggestions for when the incoming query is a never before seen query string.
Likely this suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.
Note that the weight for each suggestion is unused, and the suggestions are the analyzed forms (so your analysis process should normally be very "light").
This uses the stupid backoff language model to smooth scores across ngram models; see "Large language models in machine translation", http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.1126 for details.
From lookup(java.lang.CharSequence, boolean, int)
, the key of each result is the ngram token; the value is Long.MAX_VALUE *
score (fixed point, cast to long). Divide by Long.MAX_VALUE to get the score back, which ranges
from 0.0 to 1.0.
onlyMorePopular is unused.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
-
Field Summary
Modifier and TypeFieldDescriptionstatic final double
The constant used for backoff smoothing; during lookup, this means that if a given trigram did not occur, and we backoff to the bigram, the overall score will be 0.4 times what the bigram model would have assigned.static final String
Codec name used in the header for the saved model.static final int
By default we use a bigram model.static final byte
The default character used to join multiple tokens into a single ngram token.static final int
Current version of the saved model file format.static final int
Initial version of the saved model file format.Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorDescriptionFreeTextSuggester
(Analyzer analyzer) Instantiate, using the provided analyzer for both indexing and lookup, using bigram model by default.FreeTextSuggester
(Analyzer indexAnalyzer, Analyzer queryAnalyzer) Instantiate, using the provided indexing and lookup analyzers, using bigram model by default.FreeTextSuggester
(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int grams) Instantiate, using the provided indexing and lookup analyzers, with the specified model (2 = bigram, 3 = trigram, etc.).FreeTextSuggester
(Analyzer indexAnalyzer, Analyzer queryAnalyzer, int grams, byte separator) Instantiate, using the provided indexing and lookup analyzers, and specified model (2 = bigram, 3 = trigram ,etc.). -
Method Summary
Modifier and TypeMethodDescriptionvoid
build
(InputIterator iterator) Builds up a new internalLookup
representation based on the givenInputIterator
.void
build
(InputIterator iterator, double ramBufferSizeMB) Build the suggest index, using up to the specified amount of temporary RAM while building.get
(CharSequence key) Returns the weight associated with an input string, or null if it does not exist.long
getCount()
Get the number of entries the lookup was built withboolean
Discard current lookup data and load it from a previously saved copy.lookup
(CharSequence key, boolean onlyMorePopular, int num) Look up a key and return possible completion for this key.lookup
(CharSequence key, int num) Lookup, without any context.lookup
(CharSequence key, Set<BytesRef> contexts, boolean onlyMorePopular, int num) Look up a key and return possible completion for this key.lookup
(CharSequence key, Set<BytesRef> contexts, int num) Retrieve suggestions.long
Returns byte size of the underlying FST.boolean
store
(DataOutput output) Persist the constructed lookup data to a directory.
-
Field Details
-
CODEC_NAME
Codec name used in the header for the saved model.- See Also:
-
VERSION_START
public static final int VERSION_STARTInitial version of the saved model file format.- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENTCurrent version of the saved model file format.- See Also:
-
DEFAULT_GRAMS
public static final int DEFAULT_GRAMSBy default we use a bigram model.- See Also:
-
ALPHA
public static final double ALPHAThe constant used for backoff smoothing; during lookup, this means that if a given trigram did not occur, and we backoff to the bigram, the overall score will be 0.4 times what the bigram model would have assigned.- See Also:
-
DEFAULT_SEPARATOR
public static final byte DEFAULT_SEPARATORThe default character used to join multiple tokens into a single ngram token. The input tokens produced by the analyzer must not contain this character.- See Also:
-
-
Constructor Details
-
FreeTextSuggester
Instantiate, using the provided analyzer for both indexing and lookup, using bigram model by default. -
FreeTextSuggester
Instantiate, using the provided indexing and lookup analyzers, using bigram model by default. -
FreeTextSuggester
Instantiate, using the provided indexing and lookup analyzers, with the specified model (2 = bigram, 3 = trigram, etc.). -
FreeTextSuggester
Instantiate, using the provided indexing and lookup analyzers, and specified model (2 = bigram, 3 = trigram ,etc.). The separator is passed toShingleFilter.setTokenSeparator(java.lang.String)
to join multiple tokens into a single ngram token; it must be an ascii (7-bit-clean) byte. No input tokens should have this byte, otherwiseIllegalArgumentException
is thrown.
-
-
Method Details
-
ramBytesUsed
public long ramBytesUsed()Returns byte size of the underlying FST. -
getChildResources
-
build
Description copied from class:Lookup
Builds up a new internalLookup
representation based on the givenInputIterator
. The implementation might re-sort the data internally.- Specified by:
build
in classLookup
- Throws:
IOException
-
build
Build the suggest index, using up to the specified amount of temporary RAM while building. Note that the weights for the suggestions are ignored.- Throws:
IOException
-
store
Description copied from class:Lookup
Persist the constructed lookup data to a directory. Optional operation.- Specified by:
store
in classLookup
- Parameters:
output
-DataOutput
to write the data to.- Returns:
- true if successful, false if unsuccessful or not supported.
- Throws:
IOException
- when fatal IO error occurs.
-
load
Description copied from class:Lookup
Discard current lookup data and load it from a previously saved copy. Optional operation.- Specified by:
load
in classLookup
- Parameters:
input
- theDataInput
to load the lookup data.- Returns:
- true if completed successfully, false if unsuccessful or not supported.
- Throws:
IOException
- when fatal IO error occurs.
-
lookup
Description copied from class:Lookup
Look up a key and return possible completion for this key.- Overrides:
lookup
in classLookup
- Parameters:
key
- lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.onlyMorePopular
- return only more popular resultsnum
- maximum number of results to return- Returns:
- a list of possible completions, with their relative weight (e.g. popularity)
-
lookup
Lookup, without any context. -
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, Set<BytesRef> contexts, boolean onlyMorePopular, int num) Description copied from class:Lookup
Look up a key and return possible completion for this key.- Specified by:
lookup
in classLookup
- Parameters:
key
- lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.contexts
- contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a matchonlyMorePopular
- return only more popular resultsnum
- maximum number of results to return- Returns:
- a list of possible completions, with their relative weight (e.g. popularity)
-
getCount
public long getCount()Description copied from class:Lookup
Get the number of entries the lookup was built with -
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, Set<BytesRef> contexts, int num) throws IOException Retrieve suggestions.- Throws:
IOException
-
get
Returns the weight associated with an input string, or null if it does not exist.
-