|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.search.DocIdSet org.apache.lucene.util.WAH8DocIdSet
public final class WAH8DocIdSet
DocIdSet
implementation based on word-aligned hybrid encoding on
words of 8 bits.
This implementation doesn't support random-access but has a fast
DocIdSetIterator
which can advance in logarithmic time thanks to
an index.
The compression scheme is simplistic and should work well with sparse and
very dense doc id sets while being only slightly larger than a
FixedBitSet
for incompressible sets (overhead<2% in the worst
case) in spite of the index.
Format: The format is byte-aligned. An 8-bits word is either clean, meaning composed only of zeros or ones, or dirty, meaning that it contains between 1 and 7 bits set. The idea is to encode sequences of clean words using run-length encoding and to leave sequences of dirty words as-is.
Token | Clean length+ | Dirty length+ | Dirty words |
---|---|---|---|
1 byte | 0-n bytes | 0-n bytes | 0-n bytes |
vint
, shift it by 3 bits on
the left side and add it to the 3 bits which have been read in the token.This format cannot encode sequences of less than 2 clean words and 0 dirty word. The reason is that if you find a single clean word, you should rather encode it as a dirty word. This takes the same space as starting a new sequence (since you need one byte for the token) but will be lighter to decode. There is however an exception for the first sequence. Since the first sequence may start directly with a dirty word, the clean length is encoded directly, without subtracting 2.
There is an additional restriction on the format: the sequence of dirty words is not allowed to contain two consecutive clean words. This restriction exists to make sure no space is wasted and to make sure iterators can read the next doc ID by reading at most 2 dirty words.
Nested Class Summary | |
---|---|
static class |
WAH8DocIdSet.Builder
A builder for WAH8DocIdSet s. |
Field Summary | |
---|---|
static int |
DEFAULT_INDEX_INTERVAL
Default index interval. |
Method Summary | |
---|---|
int |
cardinality()
Return the number of documents in this DocIdSet in constant time. |
static WAH8DocIdSet |
intersect(Collection<WAH8DocIdSet> docIdSets)
Same as intersect(Collection, int) with the default index interval. |
static WAH8DocIdSet |
intersect(Collection<WAH8DocIdSet> docIdSets,
int indexInterval)
Compute the intersection of the provided sets. |
boolean |
isCacheable()
This method is a hint for CachingWrapperFilter , if this DocIdSet
should be cached without copying it. |
org.apache.lucene.util.WAH8DocIdSet.Iterator |
iterator()
Provides a DocIdSetIterator to access the set. |
long |
ramBytesUsed()
Return the memory usage of this class in bytes. |
static WAH8DocIdSet |
union(Collection<WAH8DocIdSet> docIdSets)
Same as union(Collection, int) with the default index interval. |
static WAH8DocIdSet |
union(Collection<WAH8DocIdSet> docIdSets,
int indexInterval)
Compute the union of the provided sets. |
Methods inherited from class org.apache.lucene.search.DocIdSet |
---|
bits |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_INDEX_INTERVAL
Method Detail |
---|
public static WAH8DocIdSet intersect(Collection<WAH8DocIdSet> docIdSets)
intersect(Collection, int)
with the default index interval.
public static WAH8DocIdSet intersect(Collection<WAH8DocIdSet> docIdSets, int indexInterval)
public static WAH8DocIdSet union(Collection<WAH8DocIdSet> docIdSets)
union(Collection, int)
with the default index interval.
public static WAH8DocIdSet union(Collection<WAH8DocIdSet> docIdSets, int indexInterval)
public boolean isCacheable()
DocIdSet
CachingWrapperFilter
, if this DocIdSet
should be cached without copying it. The default is to return
false
. If you have an own DocIdSet
implementation
that does its iteration very effective and fast without doing disk I/O,
override this method and return true
.
isCacheable
in class DocIdSet
public org.apache.lucene.util.WAH8DocIdSet.Iterator iterator()
DocIdSet
DocIdSetIterator
to access the set.
This implementation can return null
if there
are no docs that match.
iterator
in class DocIdSet
public int cardinality()
DocIdSet
in constant time.
public long ramBytesUsed()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |