Class CommonTermsQuery


  • public class CommonTermsQuery
    extends Query
    A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.

    CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.

    Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.

    • Field Detail

      • terms

        protected final List<Term> terms
      • maxTermFrequency

        protected final float maxTermFrequency
      • lowFreqBoost

        protected float lowFreqBoost
      • highFreqBoost

        protected float highFreqBoost
      • lowFreqMinNrShouldMatch

        protected float lowFreqMinNrShouldMatch
      • highFreqMinNrShouldMatch

        protected float highFreqMinNrShouldMatch
    • Method Detail

      • add

        public void add​(Term term)
        Adds a term to the CommonTermsQuery
        Parameters:
        term - the term to add
      • calcLowFreqMinimumNumberShouldMatch

        protected int calcLowFreqMinimumNumberShouldMatch​(int numOptional)
      • calcHighFreqMinimumNumberShouldMatch

        protected int calcHighFreqMinimumNumberShouldMatch​(int numOptional)
      • buildQuery

        protected Query buildQuery​(int maxDoc,
                                   TermStates[] contextArray,
                                   Term[] queryTerms)
      • setLowFreqMinimumNumberShouldMatch

        public void setLowFreqMinimumNumberShouldMatch​(float min)
        Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

        By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

        Parameters:
        min - the number of optional clauses that must match
      • getLowFreqMinimumNumberShouldMatch

        public float getLowFreqMinimumNumberShouldMatch()
        Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
      • setHighFreqMinimumNumberShouldMatch

        public void setHighFreqMinimumNumberShouldMatch​(float min)
        Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

        By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

        Parameters:
        min - the number of optional clauses that must match
      • getHighFreqMinimumNumberShouldMatch

        public float getHighFreqMinimumNumberShouldMatch()
        Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
      • getTerms

        public List<Term> getTerms()
        Gets the list of terms.
      • getMaxTermFrequency

        public float getMaxTermFrequency()
        Gets the maximum threshold of a terms document frequency to be considered a low frequency term.
      • getLowFreqBoost

        public float getLowFreqBoost()
        Gets the boost used for low frequency terms.
      • getHighFreqBoost

        public float getHighFreqBoost()
        Gets the boost used for high frequency terms.
      • hashCode

        public int hashCode()
        Specified by:
        hashCode in class Query
      • equals

        public boolean equals​(Object other)
        Specified by:
        equals in class Query
      • newTermQuery

        protected Query newTermQuery​(Term term,
                                     TermStates termStates)
        Builds a new TermQuery instance.

        This is intended for subclasses that wish to customize the generated queries.

        Parameters:
        term - term
        termStates - the TermStates to be used to create the low level term query. Can be null.
        Returns:
        new TermQuery instance