DocValuesTermsQuery
and boolean BooleanClause.Occur.FILTER
clauses instead@Deprecated public class DocValuesTermsFilter extends Filter
Filter
that only accepts documents whose single
term value in the specified field is contained in the
provided set of allowed terms.
This is the same functionality as TermsFilter (from queries/), except this filter requires that the field contains only a single term for all documents. Because of drastically different implementations, they also have different performance characteristics, as described below.
With each search, this filter translates the specified
set of Terms into a private FixedBitSet
keyed by
term number per unique IndexReader
(normally one
reader per segment). Then, during matching, the term
number for each docID is retrieved from the cache and
then checked for inclusion using the FixedBitSet
.
Since all testing is done using RAM resident data
structures, performance should be very fast, most likely
fast enough to not require further caching of the
DocIdSet for each possible combination of terms.
However, because docIDs are simply scanned linearly, an
index with a great many small documents may find this
linear scan too costly.
In contrast, TermsFilter builds up an FixedBitSet
,
keyed by docID, every time it's created, by enumerating
through all matching docs using PostingsEnum
to seek
and scan through each term's docID list. While there is
no linear scan of all docIDs, besides the allocation of
the underlying array in the FixedBitSet
, this
approach requires a number of "disk seeks" in proportion
to the number of terms, which can be exceptionally costly
when there are cache misses in the OS's IO cache.
Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of Terms, should be faster than TermsFilter, especially as the number of Terms being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsFilter may perform faster.
Which filter is best is very application dependent.
Constructor and Description |
---|
DocValuesTermsFilter(String field,
BytesRef... terms)
Deprecated.
|
DocValuesTermsFilter(String field,
String... terms)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
DocIdSet |
getDocIdSet(LeafReaderContext context,
Bits acceptDocs)
Deprecated.
Creates a
DocIdSet enumerating the documents that should be
permitted in search results. |
String |
toString(String defaultField)
Deprecated.
Prints a query to a string, with
field assumed to be the
default field and omitted. |
createWeight, equals, hashCode
public DocIdSet getDocIdSet(LeafReaderContext context, Bits acceptDocs) throws IOException
Filter
DocIdSet
enumerating the documents that should be
permitted in search results. NOTE: null can be
returned if no documents are accepted by this Filter.
Note: This method will be called once per segment in
the index during searching. The returned DocIdSet
must refer to document IDs for that segment, not for
the top-level reader.
getDocIdSet
in class Filter
context
- a LeafReaderContext
instance opened on the index currently
searched on. Note, it is likely that the provided reader info does not
represent the whole underlying index i.e. if the index has more than
one segment the given reader only represents a single segment.
The provided context is always an atomic context, so you can call
LeafReader.fields()
on the context's reader, for example.acceptDocs
- Bits that represent the allowable docs to match (typically deleted docs
but possibly filtering other documents)null
should be returned if
the filter doesn't accept any documents otherwise internal optimization might not apply
in the case an empty DocIdSet
is returned.IOException
Copyright © 2000-2015 Apache Software Foundation. All Rights Reserved.