Class Lucene70DocValuesFormat
 java.lang.Object

 org.apache.lucene.codecs.DocValuesFormat

 org.apache.lucene.backward_codecs.lucene70.Lucene70DocValuesFormat

 All Implemented Interfaces:
NamedSPILoader.NamedSPI
public final class Lucene70DocValuesFormat extends DocValuesFormat
Lucene 7.0 DocValues format.Documents that have a value for the field are encoded in a way that it is always possible to know the ordinal of the current document in the set of documents that have a value. For instance, say the set of documents that have a value for the field is
{1, 5, 6, 11}
. When the iterator is on6
, it knows that this is the 3rd item of the set. This way, values can be stored densely and accessed based on their index at search time. If all documents in a segment have a value for the field, the index is the same as the doc ID, so this case is encoded implicitly and is very fast at query time. On the other hand if some documents are missing a value for the field then the set of documents that have a value is encoded into blocks. All doc IDs that share the same upper 16 bits are encoded into the same block with the following strategies: SPARSE: This strategy is used when a block contains at most 4095 documents. The lower 16
bits of doc IDs are stored as
shorts
while the upper 16 bits are given by the block ID.  DENSE: This strategy is used when a block contains between 4096 and 65535 documents. The
lower bits of doc IDs are stored in a bit set. Advancing is performed using
ntz
operations while the index is computed by accumulating thebit counts
of the visited longs.  ALL: This strategy is used when a block contains exactly 65536 documents, meaning that the
block is full. In that case doc IDs do not need to be stored explicitly. This is typically
faster than both SPARSE and DENSE which is a reason why it is preferable to have all
documents that have a value for a field using contiguous doc IDs, for instance by using
index sorting
.
Then the five perdocument value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) are encoded using the following strategies:
 Deltacompressed: perdocument integers written as deltas from the minimum value,
compressed with bitpacking. For more information, see
LegacyDirectWriter
.  Tablecompressed: when the number of unique values is very small (< 256), and when there
are unused "gaps" in the range of values used (such as
SmallFloat
), a lookup table is written instead. Each perdocument entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (LegacyDirectWriter
).  GCDcompressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Deltacompressed Numerics.
 Monotoniccompressed: when all numbers are monotonically increasing offsets, they are written as blocks of bitpacked integers, encoding the deviation from the expected delta.
 Constcompressed: when there is only one possible value, no perdocument data is needed and this value is encoded alone.
 Fixedwidth Binary: one large concatenated byte[] is written, along with the fixed length.
Each document's value can be addressed directly with multiplication (
docID * length
).  Variablewidth Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written as Monotoniccompressed numerics.
 Prefixcompressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. chunk addresses are written as Monotoniccompressed numerics. A reverse lookup index is written from a portion of every 1024th term.
 Sorted: a mapping of ordinals to deduplicated terms is written as Prefixcompressed Binary, along with the perdocument ordinals written using one of the numeric strategies above.
 Single: if all documents have 0 or 1 value, then data are written like SORTED.
 SortedSet: a mapping of ordinals to deduplicated terms is written as Binary, an ordinal list and perdocument index into this list are written using the numeric strategies above.
 Single: if all documents have 0 or 1 value, then data are written like NUMERIC.
 SortedNumeric: a value list and perdocument index into this list are written using the numeric strategies above.
Files:
.dvd
: DocValues data.dvm
: DocValues metadata
 WARNING: This API is experimental and might change in incompatible ways in the next release.


Constructor Summary
Constructors Constructor Description Lucene70DocValuesFormat()
Sole Constructor

Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DocValuesConsumer
fieldsConsumer(SegmentWriteState state)
DocValuesProducer
fieldsProducer(SegmentReadState state)

Methods inherited from class org.apache.lucene.codecs.DocValuesFormat
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString




Method Detail

fieldsConsumer
public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException
 Specified by:
fieldsConsumer
in classDocValuesFormat
 Throws:
IOException

fieldsProducer
public DocValuesProducer fieldsProducer(SegmentReadState state) throws IOException
 Specified by:
fieldsProducer
in classDocValuesFormat
 Throws:
IOException

