Class AnalyzingInfixSuggester
- java.lang.Object
-
- org.apache.lucene.search.suggest.Lookup
-
- org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Accountable
- Direct Known Subclasses:
BlendedInfixSuggester
public class AnalyzingInfixSuggester extends Lookup implements Closeable
Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This also highlights the tokens that match.This suggester supports payloads. Matches are sorted only by the suggest weight; it would be nice to support blended score + weight sort in the future. This means this suggester best applies when there is a strong a-priori ranking of all the suggestions.
This suggester supports contexts, including arbitrary binary terms.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
-
-
Field Summary
Fields Modifier and Type Field Description protected static String
CONTEXTS_FIELD_NAME
Field name used for the indexed context, as a StringField and a SortedSetDVField, for filtering.static boolean
DEFAULT_ALL_TERMS_REQUIRED
Default boolean clause option for multiple terms matching (all terms required).protected static boolean
DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
Default option to close the IndexWriter once the index has been built.static boolean
DEFAULT_HIGHLIGHT
Default higlighting option.static int
DEFAULT_MIN_PREFIX_CHARS
Default minimum number of leading characters before PrefixQuery is used (4).protected static String
EXACT_TEXT_FIELD_NAME
Field name used for the indexed text, as a StringField, for exact lookup.protected Analyzer
indexAnalyzer
Analyzer used at index timeprotected Analyzer
queryAnalyzer
Analyzer used at search timeprotected SearcherManager
searcherMgr
IndexSearcher
used for lookups.protected ReadWriteLock
searcherMgrLock
Used to manage concurrent access to searcherMgrprotected static String
TEXT_FIELD_NAME
Field name used for the indexed text.protected static String
TEXTGRAMS_FIELD_NAME
edgegrams for searching short prefixes without Prefix Query that's controlled by minPrefixCharsprotected IndexWriter
writer
Used for ongoing NRT additions/updates.protected Object
writerLock
Used to manage concurrent access to writer-
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description AnalyzingInfixSuggester(Directory dir, Analyzer analyzer)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(BytesRef text, Set<BytesRef> contexts, long weight, BytesRef payload)
Adds a new suggestion.void
addContextToQuery(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)
This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.protected void
addNonMatch(StringBuilder sb, String text)
Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.protected void
addPrefixMatch(StringBuilder sb, String surface, String analyzed, String prefixToken)
Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.protected void
addWholeMatch(StringBuilder sb, String surface, String analyzed)
Called while highlighting a single result, to append the whole matched token to the provided fragments list.void
build(InputIterator iter)
Builds up a new internalLookup
representation based on the givenInputIterator
.void
close()
void
commit()
Commits all pending changes made to this suggester to disk.protected List<Lookup.LookupResult>
createResults(IndexSearcher searcher, TopFieldDocs hits, int num, CharSequence charSequence, boolean doHighlight, Set<String> matchedTokens, String prefixToken)
Create the results based on the search hits.protected Query
finishQuery(BooleanQuery.Builder in, boolean allTermsRequired)
Subclass can override this to tweak the Query before searching.long
getCount()
Get the number of entries the lookup was built withprotected Directory
getDirectory(Path path)
Subclass can override to choose a specificDirectory
implementation.protected IndexWriterConfig
getIndexWriterConfig(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)
Override this to customize index settings, e.g.protected Query
getLastTokenQuery(String token)
This is called if the last token isn't ended (e.g.protected FieldType
getTextFieldType()
Subclass can override this method to change the field type of the text field e.g.protected Object
highlight(String text, Set<String> matchedTokens, String prefixToken)
Override this method to customize the Object representing a single highlighted suggestions; the result is set on eachLookup.LookupResult.highlightKey
member.boolean
load(DataInput out)
Discard current lookup data and load it from a previously saved copy.List<Lookup.LookupResult>
lookup(CharSequence key, int num, boolean allTermsRequired, boolean doHighlight)
Lookup, without any context.List<Lookup.LookupResult>
lookup(CharSequence key, Map<BytesRef,BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight)
Retrieve suggestions, specifying whether all terms must match (allTermsRequired
) and whether the hits should be highlighted (doHighlight
).List<Lookup.LookupResult>
lookup(CharSequence key, Set<BytesRef> contexts, boolean onlyMorePopular, int num)
Look up a key and return possible completion for this key.List<Lookup.LookupResult>
lookup(CharSequence key, Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight)
Lookup, with context but without booleans.List<Lookup.LookupResult>
lookup(CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight)
This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggesterlong
ramBytesUsed()
void
refresh()
Reopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.boolean
store(DataOutput in)
Persist the constructed lookup data to a directory.void
update(BytesRef text, Set<BytesRef> contexts, long weight, BytesRef payload)
Updates a previous suggestion, matching the exact same text as before.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
TEXTGRAMS_FIELD_NAME
protected static final String TEXTGRAMS_FIELD_NAME
edgegrams for searching short prefixes without Prefix Query that's controlled by minPrefixChars- See Also:
- Constant Field Values
-
TEXT_FIELD_NAME
protected static final String TEXT_FIELD_NAME
Field name used for the indexed text.- See Also:
- Constant Field Values
-
EXACT_TEXT_FIELD_NAME
protected static final String EXACT_TEXT_FIELD_NAME
Field name used for the indexed text, as a StringField, for exact lookup.- See Also:
- Constant Field Values
-
CONTEXTS_FIELD_NAME
protected static final String CONTEXTS_FIELD_NAME
Field name used for the indexed context, as a StringField and a SortedSetDVField, for filtering.- See Also:
- Constant Field Values
-
queryAnalyzer
protected final Analyzer queryAnalyzer
Analyzer used at search time
-
indexAnalyzer
protected final Analyzer indexAnalyzer
Analyzer used at index time
-
writer
protected IndexWriter writer
Used for ongoing NRT additions/updates. May be null depending oncloseIndexWriterOnBuild
constructor arg
-
writerLock
protected final Object writerLock
Used to manage concurrent access to writer
-
searcherMgr
protected SearcherManager searcherMgr
IndexSearcher
used for lookups. May be null ifDirectory
did not exist on instantiation and neitherbuild(org.apache.lucene.search.suggest.InputIterator)
,add(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
, orupdate(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
have been called
-
searcherMgrLock
protected final ReadWriteLock searcherMgrLock
Used to manage concurrent access to searcherMgr
-
DEFAULT_MIN_PREFIX_CHARS
public static final int DEFAULT_MIN_PREFIX_CHARS
Default minimum number of leading characters before PrefixQuery is used (4).- See Also:
- Constant Field Values
-
DEFAULT_ALL_TERMS_REQUIRED
public static final boolean DEFAULT_ALL_TERMS_REQUIRED
Default boolean clause option for multiple terms matching (all terms required).- See Also:
- Constant Field Values
-
DEFAULT_HIGHLIGHT
public static final boolean DEFAULT_HIGHLIGHT
Default higlighting option.- See Also:
- Constant Field Values
-
DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
protected static final boolean DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
Default option to close the IndexWriter once the index has been built.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer analyzer) throws IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Throws:
IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild) throws IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Parameters:
minPrefixChars
- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild
- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.- Throws:
IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight) throws IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Parameters:
minPrefixChars
- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild
- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired
- All terms in the suggest query must be matched.highlight
- Highlight suggest query in suggestions.- Throws:
IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild) throws IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Parameters:
minPrefixChars
- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild
- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired
- All terms in the suggest query must be matched.highlight
- Highlight suggest query in suggestions.closeIndexWriterOnBuild
- If true, the IndexWriter will be closed after the index has finished building.- Throws:
IOException
-
-
Method Detail
-
getIndexWriterConfig
protected IndexWriterConfig getIndexWriterConfig(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)
Override this to customize index settings, e.g. which codec to use.
-
getDirectory
protected Directory getDirectory(Path path) throws IOException
Subclass can override to choose a specificDirectory
implementation.- Throws:
IOException
-
build
public void build(InputIterator iter) throws IOException
Description copied from class:Lookup
Builds up a new internalLookup
representation based on the givenInputIterator
. The implementation might re-sort the data internally.- Specified by:
build
in classLookup
- Throws:
IOException
-
commit
public void commit() throws IOException
Commits all pending changes made to this suggester to disk.- Throws:
IOException
- See Also:
IndexWriter.commit()
-
add
public void add(BytesRef text, Set<BytesRef> contexts, long weight, BytesRef payload) throws IOException
Adds a new suggestion. Be sure to useupdate(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
instead if you want to replace a previous suggestion. After adding or updating a batch of new suggestions, you must callrefresh()
in the end in order to see the suggestions inlookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
- Throws:
IOException
-
update
public void update(BytesRef text, Set<BytesRef> contexts, long weight, BytesRef payload) throws IOException
Updates a previous suggestion, matching the exact same text as before. Use this to change the weight or payload of an already added suggestion. If you know this text is not already present you can useadd(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
instead. After adding or updating a batch of new suggestions, you must callrefresh()
in the end in order to see the suggestions inlookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
- Throws:
IOException
-
refresh
public void refresh() throws IOException
Reopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.- Throws:
IOException
-
getTextFieldType
protected FieldType getTextFieldType()
Subclass can override this method to change the field type of the text field e.g. to change the index options
-
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, Set<BytesRef> contexts, boolean onlyMorePopular, int num) throws IOException
Description copied from class:Lookup
Look up a key and return possible completion for this key.- Specified by:
lookup
in classLookup
- Parameters:
key
- lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.contexts
- contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a matchonlyMorePopular
- return only more popular resultsnum
- maximum number of results to return- Returns:
- a list of possible completions, with their relative weight (e.g. popularity)
- Throws:
IOException
-
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
Lookup, without any context.- Throws:
IOException
-
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
Lookup, with context but without booleans. Context booleans default to SHOULD, so each suggestion must have at least one of the contexts.- Throws:
IOException
-
getLastTokenQuery
protected Query getLastTokenQuery(String token) throws IOException
This is called if the last token isn't ended (e.g. user did not type a space after it). Return an appropriate Query clause to add to the BooleanQuery.- Throws:
IOException
-
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, Map<BytesRef,BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
Retrieve suggestions, specifying whether all terms must match (allTermsRequired
) and whether the hits should be highlighted (doHighlight
).- Throws:
IOException
-
addContextToQuery
public void addContextToQuery(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)
This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.- Parameters:
query
- an instance of @SeeBooleanQuery
context
- the contextclause
- one ofBooleanClause.Occur
-
lookup
public List<Lookup.LookupResult> lookup(CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight) throws IOException
This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggester- Overrides:
lookup
in classLookup
- Parameters:
key
- the keyword being looked forcontextQuery
- an arbitrary Lucene query to be used to filter the result of the suggester.addContextToQuery(org.apache.lucene.search.BooleanQuery.Builder, org.apache.lucene.util.BytesRef, org.apache.lucene.search.BooleanClause.Occur)
could be used to build this contextQuery.num
- number of items to returnallTermsRequired
- all searched terms must match or notdoHighlight
- if true, the matching term will be highlighted in the search result- Returns:
- the result of the suggester
- Throws:
IOException
- f the is IO exception while reading data from the index
-
createResults
protected List<Lookup.LookupResult> createResults(IndexSearcher searcher, TopFieldDocs hits, int num, CharSequence charSequence, boolean doHighlight, Set<String> matchedTokens, String prefixToken) throws IOException
Create the results based on the search hits. Can be overridden by subclass to add particular behavior (e.g. weight transformation). Note that there is no prefix token (theprefixToken
argument will be null) whenever the final token in the incoming request was in fact finished (had trailing characters, such as white-space).- Throws:
IOException
- If there are problems reading fields from the underlying Lucene index.
-
finishQuery
protected Query finishQuery(BooleanQuery.Builder in, boolean allTermsRequired)
Subclass can override this to tweak the Query before searching.
-
highlight
protected Object highlight(String text, Set<String> matchedTokens, String prefixToken) throws IOException
Override this method to customize the Object representing a single highlighted suggestions; the result is set on eachLookup.LookupResult.highlightKey
member.- Throws:
IOException
-
addNonMatch
protected void addNonMatch(StringBuilder sb, String text)
Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.- Parameters:
sb
- TheStringBuilder
to append totext
- The text chunk to add
-
addWholeMatch
protected void addWholeMatch(StringBuilder sb, String surface, String analyzed)
Called while highlighting a single result, to append the whole matched token to the provided fragments list.- Parameters:
sb
- TheStringBuilder
to append tosurface
- The surface form (original) textanalyzed
- The analyzed token corresponding to the surface form text
-
addPrefixMatch
protected void addPrefixMatch(StringBuilder sb, String surface, String analyzed, String prefixToken)
Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.- Parameters:
sb
- TheStringBuilder
to append tosurface
- The fragment of the surface form (indexed duringbuild(org.apache.lucene.search.suggest.InputIterator)
, corresponding to this matchanalyzed
- The analyzed token that matchedprefixToken
- The prefix of the token that matched
-
store
public boolean store(DataOutput in) throws IOException
Description copied from class:Lookup
Persist the constructed lookup data to a directory. Optional operation.- Specified by:
store
in classLookup
- Parameters:
in
-DataOutput
to write the data to.- Returns:
- true if successful, false if unsuccessful or not supported.
- Throws:
IOException
- when fatal IO error occurs.
-
load
public boolean load(DataInput out) throws IOException
Description copied from class:Lookup
Discard current lookup data and load it from a previously saved copy. Optional operation.- Specified by:
load
in classLookup
- Parameters:
out
- theDataInput
to load the lookup data.- Returns:
- true if completed successfully, false if unsuccessful or not supported.
- Throws:
IOException
- when fatal IO error occurs.
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
ramBytesUsed
public long ramBytesUsed()
- Specified by:
ramBytesUsed
in interfaceAccountable
-
getCount
public long getCount() throws IOException
Description copied from class:Lookup
Get the number of entries the lookup was built with- Specified by:
getCount
in classLookup
- Returns:
- total number of suggester entries
- Throws:
IOException
-
-