public class CommonTermsQuery extends Query
added
terms where low-frequency
terms are added to a required boolean clause and high-frequency terms are
added to an optional boolean clause. The optional clause is only executed if
the required "low-frequency' clause matches. Scores produced by this query
will be slightly different to plain BooleanQuery
scorer mainly due to
differences in the number of leave queries
in the required boolean clause. In the most cases high-frequency terms are
unlikely to significantly contribute to the document score unless at least
one of the low-frequency terms are matched such that this query can improve
query execution times significantly if applicable.
CommonTermsQuery
has several advantages over stopword filtering at
index or query time since a term can be "classified" based on the actual
document frequency in the index and can prevent slow queries even across
domains without specialized stopword files.
Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.
Modifier and Type | Field and Description |
---|---|
protected boolean |
disableCoord |
protected float |
highFreqBoost |
protected BooleanClause.Occur |
highFreqOccur |
protected float |
lowFreqBoost |
protected BooleanClause.Occur |
lowFreqOccur |
protected float |
maxTermFrequency |
protected float |
minNrShouldMatch |
protected List<Term> |
terms |
Constructor and Description |
---|
CommonTermsQuery(BooleanClause.Occur highFreqOccur,
BooleanClause.Occur lowFreqOccur,
float maxTermFrequency)
Creates a new
CommonTermsQuery |
CommonTermsQuery(BooleanClause.Occur highFreqOccur,
BooleanClause.Occur lowFreqOccur,
float maxTermFrequency,
boolean disableCoord)
Creates a new
CommonTermsQuery |
Modifier and Type | Method and Description |
---|---|
void |
add(Term term)
Adds a term to the
CommonTermsQuery |
protected Query |
buildQuery(int maxDoc,
TermContext[] contextArray,
Term[] queryTerms) |
protected int |
calcLowFreqMinimumNumberShouldMatch(int numOptional) |
void |
collectTermContext(IndexReader reader,
List<AtomicReaderContext> leaves,
TermContext[] contextArray,
Term[] queryTerms) |
boolean |
equals(Object obj) |
void |
extractTerms(Set<Term> terms) |
float |
getMinimumNumberShouldMatch()
Gets the minimum number of the optional BooleanClauses which must be
satisfied.
|
int |
hashCode() |
boolean |
isCoordDisabled()
Returns true iff
Similarity.coord(int,int) is disabled in scoring
for the high and low frequency query instance. |
Query |
rewrite(IndexReader reader) |
void |
setMinimumNumberShouldMatch(float min)
Specifies a minimum number of the optional BooleanClauses which must be
satisfied in order to produce a match on the low frequency terms query
part.
|
String |
toString(String field) |
protected final boolean disableCoord
protected final float maxTermFrequency
protected final BooleanClause.Occur lowFreqOccur
protected final BooleanClause.Occur highFreqOccur
protected float lowFreqBoost
protected float highFreqBoost
protected float minNrShouldMatch
public CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency)
CommonTermsQuery
highFreqOccur
- BooleanClause.Occur
used for high frequency termslowFreqOccur
- BooleanClause.Occur
used for low frequency termsmaxTermFrequency
- a value in [0..1) (or absolute number >=1) representing the
maximum threshold of a terms document frequency to be considered a
low frequency term.IllegalArgumentException
- if BooleanClause.Occur.MUST_NOT
is pass as lowFreqOccur or
highFreqOccurpublic CommonTermsQuery(BooleanClause.Occur highFreqOccur, BooleanClause.Occur lowFreqOccur, float maxTermFrequency, boolean disableCoord)
CommonTermsQuery
highFreqOccur
- BooleanClause.Occur
used for high frequency termslowFreqOccur
- BooleanClause.Occur
used for low frequency termsmaxTermFrequency
- a value in [0..1) (or absolute number >=1) representing the
maximum threshold of a terms document frequency to be considered a
low frequency term.disableCoord
- disables Similarity.coord(int,int)
in scoring for the low
/ high frequency sub-queriesIllegalArgumentException
- if BooleanClause.Occur.MUST_NOT
is pass as lowFreqOccur or
highFreqOccurpublic void add(Term term)
CommonTermsQuery
term
- the term to addpublic Query rewrite(IndexReader reader) throws IOException
rewrite
in class Query
IOException
protected int calcLowFreqMinimumNumberShouldMatch(int numOptional)
protected Query buildQuery(int maxDoc, TermContext[] contextArray, Term[] queryTerms)
public void collectTermContext(IndexReader reader, List<AtomicReaderContext> leaves, TermContext[] contextArray, Term[] queryTerms) throws IOException
IOException
public boolean isCoordDisabled()
Similarity.coord(int,int)
is disabled in scoring
for the high and low frequency query instance. The top level query will
always disable coords.public void setMinimumNumberShouldMatch(float min)
By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.
min
- the number of optional clauses that must matchpublic float getMinimumNumberShouldMatch()
public void extractTerms(Set<Term> terms)
extractTerms
in class Query
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.