Class DocValuesTermsQuery
- All Implemented Interfaces:
Accountable
Query
that only accepts documents whose term value in the specified field is contained
in the provided set of allowed terms.
This is the same functionality as TermsQuery (from queries/), but because of drastically different implementations, they also have different performance characteristics, as described below.
NOTE: be very careful using this query: it is typically much slower than using
TermsQuery
, but in certain specialized cases may be faster.
With each search, this query translates the specified set of Terms into a private LongBitSet
keyed by term number per unique IndexReader
(normally one reader per
segment). Then, during matching, the term number for each docID is retrieved from the cache and
then checked for inclusion using the LongBitSet
. Since all testing is done using RAM
resident data structures, performance should be very fast, most likely fast enough to not require
further caching of the DocIdSet for each possible combination of terms. However, because docIDs
are simply scanned linearly, an index with a great many small documents may find this linear scan
too costly.
In contrast, TermsQuery builds up an FixedBitSet
, keyed by docID, every time it's
created, by enumerating through all matching docs using PostingsEnum
to seek and scan through each term's docID list. While
there is no linear scan of all docIDs, besides the allocation of the underlying array in the
FixedBitSet
, this approach requires a number of "disk seeks" in proportion to the number
of terms, which can be exceptionally costly when there are cache misses in the OS's IO cache.
Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of Terms, should be faster than TermsQuery, especially as the number of Terms being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsQuery may perform faster.
Which query is best is very application dependent.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorDescriptionDocValuesTermsQuery
(String field, String... terms) DocValuesTermsQuery
(String field, Collection<BytesRef> terms) DocValuesTermsQuery
(String field, BytesRef... terms) -
Method Summary
Modifier and TypeMethodDescriptioncreateWeight
(IndexSearcher searcher, ScoreMode scoreMode, float boost) boolean
getField()
getTerms()
int
hashCode()
long
void
visit
(QueryVisitor visitor) Methods inherited from class org.apache.lucene.search.Query
classHash, rewrite, sameClassAs, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
Constructor Details
-
DocValuesTermsQuery
-
DocValuesTermsQuery
-
DocValuesTermsQuery
-
-
Method Details
-
equals
-
hashCode
public int hashCode() -
toString
-
getField
- Returns:
- the name of the field searched by this query.
-
getTerms
- Returns:
- the terms looked up by this query, prefix-encoded.
-
ramBytesUsed
public long ramBytesUsed()- Specified by:
ramBytesUsed
in interfaceAccountable
-
visit
-
createWeight
public Weight createWeight(IndexSearcher searcher, ScoreMode scoreMode, float boost) throws IOException - Overrides:
createWeight
in classQuery
- Throws:
IOException
-