public class FSTCompletionLookup extends Lookup
Lookup API to FSTCompletion.
This adapter differs from FSTCompletion in that it attempts
to discretize any "weights" as passed from in TermFreqIterator.weight()
to match the number of buckets. For the rationale for bucketing, see
FSTCompletion.
Note:Discretization requires an additional sorting pass.
The range of weights for bucketing/ discretization is determined by sorting the input by weight and then dividing into equal ranges. Then, scores within each range are assigned to that bucket.
Note that this means that even large differences in weights may be lost during automaton construction, but the overall distinction between "classes" of weights will be preserved regardless of the distribution of weights.
For fine-grained control over which weights are assigned to which buckets,
use FSTCompletion directly or TSTLookup, for example.
FSTCompletionLookup.LookupPriorityQueue, Lookup.LookupResultCHARSEQUENCE_COMPARATOR| Constructor and Description |
|---|
FSTCompletionLookup()
This constructor prepares for creating a suggested FST using the
build(TermFreqIterator) method. |
FSTCompletionLookup(FSTCompletion completion,
boolean exactMatchFirst)
This constructor takes a pre-built automaton.
|
FSTCompletionLookup(int buckets,
boolean exactMatchFirst)
This constructor prepares for creating a suggested FST using the
build(TermFreqIterator) method. |
| Modifier and Type | Method and Description |
|---|---|
void |
build(TermFreqIterator tfit)
Builds up a new internal
Lookup representation based on the given TermFreqIterator. |
Object |
get(CharSequence key) |
boolean |
load(InputStream input)
Discard current lookup data and load it from a previously saved copy.
|
List<Lookup.LookupResult> |
lookup(CharSequence key,
boolean higherWeightsFirst,
int num)
Look up a key and return possible completion for this key.
|
boolean |
store(OutputStream output)
Persist the constructed lookup data to a directory.
|
public FSTCompletionLookup()
build(TermFreqIterator) method. The number of weight
discretization buckets is set to FSTCompletion.DEFAULT_BUCKETS and
exact matches are promoted to the top of the suggestions list.public FSTCompletionLookup(int buckets,
boolean exactMatchFirst)
build(TermFreqIterator) method.buckets - The number of weight discretization buckets (see
FSTCompletion for details).exactMatchFirst - If true exact matches are promoted to the top of the
suggestions list. Otherwise they appear in the order of
discretized weight and alphabetical within the bucket.public FSTCompletionLookup(FSTCompletion completion, boolean exactMatchFirst)
completion - An instance of FSTCompletion.exactMatchFirst - If true exact matches are promoted to the top of the
suggestions list. Otherwise they appear in the order of
discretized weight and alphabetical within the bucket.public void build(TermFreqIterator tfit) throws IOException
Lookup representation based on the given TermFreqIterator.
The implementation might re-sort the data internally.build in class LookupIOExceptionpublic List<Lookup.LookupResult> lookup(CharSequence key, boolean higherWeightsFirst, int num)
Lookuplookup in class Lookupkey - lookup key. Depending on the implementation this may be
a prefix, misspelling, or even infix.higherWeightsFirst - return only more popular resultsnum - maximum number of results to returnpublic Object get(CharSequence key)
public boolean store(OutputStream output) throws IOException
Lookupstore in class Lookupoutput - OutputStream to write the data to.IOException - when fatal IO error occurs.public boolean load(InputStream input) throws IOException
Lookupload in class Lookupinput - the InputStream to load the lookup data.IOException - when fatal IO error occurs.