Class CommonTermsQuery

java.lang.Object
org.apache.lucene.search.Query
org.apache.lucene.queries.CommonTermsQuery

public class CommonTermsQuery extends Query
A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.

CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.

Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.

  • Field Details

    • terms

      protected final List<Term> terms
    • maxTermFrequency

      protected final float maxTermFrequency
    • lowFreqOccur

      protected final BooleanClause.Occur lowFreqOccur
    • highFreqOccur

      protected final BooleanClause.Occur highFreqOccur
    • lowFreqBoost

      protected float lowFreqBoost
    • highFreqBoost

      protected float highFreqBoost
    • lowFreqMinNrShouldMatch

      protected float lowFreqMinNrShouldMatch
    • highFreqMinNrShouldMatch

      protected float highFreqMinNrShouldMatch
  • Constructor Details

  • Method Details

    • add

      public void add(Term term)
      Adds a term to the CommonTermsQuery
      Parameters:
      term - the term to add
    • rewrite

      public Query rewrite(IndexReader reader) throws IOException
      Overrides:
      rewrite in class Query
      Throws:
      IOException
    • visit

      public void visit(QueryVisitor visitor)
      Specified by:
      visit in class Query
    • calcLowFreqMinimumNumberShouldMatch

      protected int calcLowFreqMinimumNumberShouldMatch(int numOptional)
    • calcHighFreqMinimumNumberShouldMatch

      protected int calcHighFreqMinimumNumberShouldMatch(int numOptional)
    • buildQuery

      protected Query buildQuery(int maxDoc, TermStates[] contextArray, Term[] queryTerms)
    • collectTermStates

      public void collectTermStates(IndexReader reader, List<LeafReaderContext> leaves, TermStates[] contextArray, Term[] queryTerms) throws IOException
      Throws:
      IOException
    • setLowFreqMinimumNumberShouldMatch

      public void setLowFreqMinimumNumberShouldMatch(float min)
      Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

      By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

      Parameters:
      min - the number of optional clauses that must match
    • getLowFreqMinimumNumberShouldMatch

      public float getLowFreqMinimumNumberShouldMatch()
      Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
    • setHighFreqMinimumNumberShouldMatch

      public void setHighFreqMinimumNumberShouldMatch(float min)
      Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

      By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

      Parameters:
      min - the number of optional clauses that must match
    • getHighFreqMinimumNumberShouldMatch

      public float getHighFreqMinimumNumberShouldMatch()
      Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
    • getTerms

      public List<Term> getTerms()
      Gets the list of terms.
    • getMaxTermFrequency

      public float getMaxTermFrequency()
      Gets the maximum threshold of a terms document frequency to be considered a low frequency term.
    • getLowFreqOccur

      public BooleanClause.Occur getLowFreqOccur()
      Gets the BooleanClause.Occur used for low frequency terms.
    • getHighFreqOccur

      public BooleanClause.Occur getHighFreqOccur()
      Gets the BooleanClause.Occur used for high frequency terms.
    • getLowFreqBoost

      public float getLowFreqBoost()
      Gets the boost used for low frequency terms.
    • getHighFreqBoost

      public float getHighFreqBoost()
      Gets the boost used for high frequency terms.
    • toString

      public String toString(String field)
      Specified by:
      toString in class Query
    • hashCode

      public int hashCode()
      Specified by:
      hashCode in class Query
    • equals

      public boolean equals(Object other)
      Specified by:
      equals in class Query
    • newTermQuery

      protected Query newTermQuery(Term term, TermStates termStates)
      Builds a new TermQuery instance.

      This is intended for subclasses that wish to customize the generated queries.

      Parameters:
      term - term
      termStates - the TermStates to be used to create the low level term query. Can be null.
      Returns:
      new TermQuery instance