public class AnalyzingInfixSuggester extends Lookup implements Closeable
This suggester supports payloads. Matches are sorted only by the suggest weight; it would be nice to support blended score + weight sort in the future. This means this suggester best applies when there is a strong a-priori ranking of all the suggestions.
This suggester supports contexts, including arbitrary binary terms.
Lookup.LookupPriorityQueue, Lookup.LookupResult
Modifier and Type | Field and Description |
---|---|
protected static String |
CONTEXTS_FIELD_NAME
Field name used for the indexed context, as a
StringField and a SortedSetDVField, for filtering.
|
static boolean |
DEFAULT_ALL_TERMS_REQUIRED
Default boolean clause option for multiple terms matching (all terms required).
|
protected static boolean |
DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
Default option to close the IndexWriter once the index has been built.
|
static boolean |
DEFAULT_HIGHLIGHT
Default higlighting option.
|
static int |
DEFAULT_MIN_PREFIX_CHARS
Default minimum number of leading characters before
PrefixQuery is used (4).
|
protected static String |
EXACT_TEXT_FIELD_NAME
Field name used for the indexed text, as a
StringField, for exact lookup.
|
protected Analyzer |
indexAnalyzer
Analyzer used at index time
|
protected Analyzer |
queryAnalyzer
Analyzer used at search time
|
protected SearcherManager |
searcherMgr
IndexSearcher used for lookups. |
protected Object |
searcherMgrLock
Used to manage concurrent access to searcherMgr
|
protected static String |
TEXT_FIELD_NAME
Field name used for the indexed text.
|
protected static String |
TEXTGRAMS_FIELD_NAME
edgegrams for searching short prefixes without Prefix Query
that's controlled by minPrefixChars
|
protected IndexWriter |
writer
Used for ongoing NRT additions/updates.
|
CHARSEQUENCE_COMPARATOR
Constructor and Description |
---|
AnalyzingInfixSuggester(Directory dir,
Analyzer analyzer)
Create a new instance, loading from a previously built
AnalyzingInfixSuggester directory, if it exists.
|
AnalyzingInfixSuggester(Directory dir,
Analyzer indexAnalyzer,
Analyzer queryAnalyzer,
int minPrefixChars,
boolean commitOnBuild)
Create a new instance, loading from a previously built
AnalyzingInfixSuggester directory, if it exists.
|
AnalyzingInfixSuggester(Directory dir,
Analyzer indexAnalyzer,
Analyzer queryAnalyzer,
int minPrefixChars,
boolean commitOnBuild,
boolean allTermsRequired,
boolean highlight)
Create a new instance, loading from a previously built
AnalyzingInfixSuggester directory, if it exists.
|
AnalyzingInfixSuggester(Directory dir,
Analyzer indexAnalyzer,
Analyzer queryAnalyzer,
int minPrefixChars,
boolean commitOnBuild,
boolean allTermsRequired,
boolean highlight,
boolean closeIndexWriterOnBuild)
Create a new instance, loading from a previously built
AnalyzingInfixSuggester directory, if it exists.
|
Modifier and Type | Method and Description |
---|---|
void |
add(BytesRef text,
Set<BytesRef> contexts,
long weight,
BytesRef payload)
Adds a new suggestion.
|
void |
addContextToQuery(BooleanQuery.Builder query,
BytesRef context,
BooleanClause.Occur clause)
This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries
However, here may not be its best location.
|
protected void |
addNonMatch(StringBuilder sb,
String text)
Called while highlighting a single result, to append a
non-matching chunk of text from the suggestion to the
provided fragments list.
|
protected void |
addPrefixMatch(StringBuilder sb,
String surface,
String analyzed,
String prefixToken)
Called while highlighting a single result, to append a
matched prefix token, to the provided fragments list.
|
protected void |
addWholeMatch(StringBuilder sb,
String surface,
String analyzed)
Called while highlighting a single result, to append
the whole matched token to the provided fragments list.
|
void |
build(InputIterator iter)
Builds up a new internal
Lookup representation based on the given InputIterator . |
void |
close() |
void |
commit()
Commits all pending changes made to this suggester to disk.
|
protected List<Lookup.LookupResult> |
createResults(IndexSearcher searcher,
TopFieldDocs hits,
int num,
CharSequence charSequence,
boolean doHighlight,
Set<String> matchedTokens,
String prefixToken)
Create the results based on the search hits.
|
protected Query |
finishQuery(BooleanQuery.Builder in,
boolean allTermsRequired)
Subclass can override this to tweak the Query before
searching.
|
Collection<Accountable> |
getChildResources() |
long |
getCount()
Get the number of entries the lookup was built with
|
protected Directory |
getDirectory(Path path)
Subclass can override to choose a specific
Directory implementation. |
protected IndexWriterConfig |
getIndexWriterConfig(Analyzer indexAnalyzer,
IndexWriterConfig.OpenMode openMode)
Override this to customize index settings, e.g.
|
protected Query |
getLastTokenQuery(String token)
This is called if the last token isn't ended
(e.g.
|
protected FieldType |
getTextFieldType()
Subclass can override this method to change the field type of the text field
e.g.
|
protected Object |
highlight(String text,
Set<String> matchedTokens,
String prefixToken)
Override this method to customize the Object
representing a single highlighted suggestions; the
result is set on each
Lookup.LookupResult.highlightKey member. |
boolean |
load(DataInput out)
Discard current lookup data and load it from a previously saved copy.
|
List<Lookup.LookupResult> |
lookup(CharSequence key,
BooleanQuery contextQuery,
int num,
boolean allTermsRequired,
boolean doHighlight)
This is an advanced method providing the capability to send down to the suggester any
arbitrary lucene query to be used to filter the result of the suggester
|
List<Lookup.LookupResult> |
lookup(CharSequence key,
int num,
boolean allTermsRequired,
boolean doHighlight)
Lookup, without any context.
|
List<Lookup.LookupResult> |
lookup(CharSequence key,
Map<BytesRef,BooleanClause.Occur> contextInfo,
int num,
boolean allTermsRequired,
boolean doHighlight)
Retrieve suggestions, specifying whether all terms
must match (
allTermsRequired ) and whether the hits
should be highlighted (doHighlight ). |
List<Lookup.LookupResult> |
lookup(CharSequence key,
Set<BytesRef> contexts,
boolean onlyMorePopular,
int num)
Look up a key and return possible completion for this key.
|
List<Lookup.LookupResult> |
lookup(CharSequence key,
Set<BytesRef> contexts,
int num,
boolean allTermsRequired,
boolean doHighlight)
Lookup, with context but without booleans.
|
long |
ramBytesUsed() |
void |
refresh()
Reopens the underlying searcher; it's best to "batch
up" many additions/updates, and then call refresh
once in the end.
|
boolean |
store(DataOutput in)
Persist the constructed lookup data to a directory.
|
void |
update(BytesRef text,
Set<BytesRef> contexts,
long weight,
BytesRef payload)
Updates a previous suggestion, matching the exact same
text as before.
|
protected static final String TEXTGRAMS_FIELD_NAME
protected static final String TEXT_FIELD_NAME
protected static final String EXACT_TEXT_FIELD_NAME
protected static final String CONTEXTS_FIELD_NAME
protected final Analyzer queryAnalyzer
protected final Analyzer indexAnalyzer
protected IndexWriter writer
protected SearcherManager searcherMgr
IndexSearcher
used for lookups.protected final Object searcherMgrLock
public static final int DEFAULT_MIN_PREFIX_CHARS
public static final boolean DEFAULT_ALL_TERMS_REQUIRED
public static final boolean DEFAULT_HIGHLIGHT
protected static final boolean DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
public AnalyzingInfixSuggester(Directory dir, Analyzer analyzer) throws IOException
close()
will also close the provided directory.IOException
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild) throws IOException
close()
will also close the provided directory.minPrefixChars
- Minimum number of leading characters
before PrefixQuery is used (default 4).
Prefixes shorter than this are indexed as character
ngrams (increasing index size but making lookups
faster).commitOnBuild
- Call commit after the index has finished building. This would persist the
suggester index to disk and future instances of this suggester can use this pre-built dictionary.IOException
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight) throws IOException
close()
will also close the provided directory.minPrefixChars
- Minimum number of leading characters
before PrefixQuery is used (default 4).
Prefixes shorter than this are indexed as character
ngrams (increasing index size but making lookups
faster).commitOnBuild
- Call commit after the index has finished building. This would persist the
suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired
- All terms in the suggest query must be matched.highlight
- Highlight suggest query in suggestions.IOException
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild) throws IOException
close()
will also close the provided directory.minPrefixChars
- Minimum number of leading characters
before PrefixQuery is used (default 4).
Prefixes shorter than this are indexed as character
ngrams (increasing index size but making lookups
faster).commitOnBuild
- Call commit after the index has finished building. This would persist the
suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired
- All terms in the suggest query must be matched.highlight
- Highlight suggest query in suggestions.closeIndexWriterOnBuild
- If true, the IndexWriter will be closed after the index has finished building.IOException
protected IndexWriterConfig getIndexWriterConfig(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)
protected Directory getDirectory(Path path) throws IOException
Directory
implementation.IOException
public void build(InputIterator iter) throws IOException
Lookup
Lookup
representation based on the given InputIterator
.
The implementation might re-sort the data internally.build
in class Lookup
IOException
public void commit() throws IOException
IOException
IndexWriter.commit()
public void add(BytesRef text, Set<BytesRef> contexts, long weight, BytesRef payload) throws IOException
update(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
instead if you want to replace a previous suggestion.
After adding or updating a batch of new suggestions,
you must call refresh()
in the end in order to
see the suggestions in lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
IOException
public void update(BytesRef text, Set<BytesRef> contexts, long weight, BytesRef payload) throws IOException
add(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
instead. After adding or updating a batch of
new suggestions, you must call refresh()
in the
end in order to see the suggestions in lookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
IOException
public void refresh() throws IOException
IOException
protected FieldType getTextFieldType()
public List<Lookup.LookupResult> lookup(CharSequence key, Set<BytesRef> contexts, boolean onlyMorePopular, int num) throws IOException
Lookup
lookup
in class Lookup
key
- lookup key. Depending on the implementation this may be
a prefix, misspelling, or even infix.contexts
- contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a matchonlyMorePopular
- return only more popular resultsnum
- maximum number of results to returnIOException
public List<Lookup.LookupResult> lookup(CharSequence key, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
IOException
public List<Lookup.LookupResult> lookup(CharSequence key, Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
IOException
protected Query getLastTokenQuery(String token) throws IOException
IOException
public List<Lookup.LookupResult> lookup(CharSequence key, Map<BytesRef,BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
allTermsRequired
) and whether the hits
should be highlighted (doHighlight
).IOException
public void addContextToQuery(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)
query
- an instance of @See BooleanQuery
context
- the contextclause
- one of BooleanClause.Occur
public List<Lookup.LookupResult> lookup(CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
lookup
in class Lookup
key
- the keyword being looked forcontextQuery
- an arbitrary Lucene query to be used to filter the result of the suggester. addContextToQuery(org.apache.lucene.search.BooleanQuery.Builder, org.apache.lucene.util.BytesRef, org.apache.lucene.search.BooleanClause.Occur)
could be used to build this contextQuery.num
- number of items to returnallTermsRequired
- all searched terms must match or notdoHighlight
- if true, the matching term will be highlighted in the search resultIOException
- f the is IO exception while reading data from the indexprotected List<Lookup.LookupResult> createResults(IndexSearcher searcher, TopFieldDocs hits, int num, CharSequence charSequence, boolean doHighlight, Set<String> matchedTokens, String prefixToken) throws IOException
prefixToken
argument will
be null) whenever the final token in the incoming request was in fact finished
(had trailing characters, such as white-space).IOException
- If there are problems reading fields from the underlying Lucene index.protected Query finishQuery(BooleanQuery.Builder in, boolean allTermsRequired)
protected Object highlight(String text, Set<String> matchedTokens, String prefixToken) throws IOException
Lookup.LookupResult.highlightKey
member.IOException
protected void addNonMatch(StringBuilder sb, String text)
sb
- The StringBuilder
to append totext
- The text chunk to addprotected void addWholeMatch(StringBuilder sb, String surface, String analyzed)
sb
- The StringBuilder
to append tosurface
- The surface form (original) textanalyzed
- The analyzed token corresponding to the surface form textprotected void addPrefixMatch(StringBuilder sb, String surface, String analyzed, String prefixToken)
sb
- The StringBuilder
to append tosurface
- The fragment of the surface form
(indexed during build(org.apache.lucene.search.suggest.InputIterator)
, corresponding to
this matchanalyzed
- The analyzed token that matchedprefixToken
- The prefix of the token that matchedpublic boolean store(DataOutput in) throws IOException
Lookup
store
in class Lookup
in
- DataOutput
to write the data to.IOException
- when fatal IO error occurs.public boolean load(DataInput out) throws IOException
Lookup
load
in class Lookup
out
- the DataInput
to load the lookup data.IOException
- when fatal IO error occurs.public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
IOException
public long ramBytesUsed()
ramBytesUsed
in interface Accountable
public Collection<Accountable> getChildResources()
getChildResources
in interface Accountable
public long getCount() throws IOException
Lookup
getCount
in class Lookup
IOException
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.