public class Lucene49DocValuesFormat extends DocValuesFormat
Encodes the five per-document value types (Numeric,Binary,Sorted,SortedSet,SortedNumeric) with these strategies:
DirectWriter
.
SmallFloat
),
a lookup table is written instead. Each per-document entry is instead the ordinal
to this table, and those ordinals are compressed with bitpacking (DirectWriter
).
docID * length
).
Files:
The DocValues metadata or .dvm file.
For DocValues field, this stores metadata, such as the offset into the DocValues data (.dvd)
DocValues metadata (.dvm) --> Header,<Entry>NumFields,Footer
Int64
TableSize,BitsPerValueVInt
Byte
CodecHeader
Int64
vInt
CodecFooter
Sorted fields have two entries: a BinaryEntry with the value metadata, and an ordinary NumericEntry for the document-to-ord metadata.
SortedSet fields have three entries: a BinaryEntry with the value metadata, and two NumericEntries for the document-to-ord-index and ordinal list metadata.
SortedNumeric fields have two entries: A NumericEntry with the value metadata, and a numeric entry with the document-to-value index.
FieldNumber of -1 indicates the end of metadata.
EntryType is a 0 (NumericEntry) or 1 (BinaryEntry)
DataOffset is the pointer to the start of the data in the DocValues data (.dvd)
EndOffset is the pointer to the end of the data in the DocValues data (.dvd)
NumericType indicates how Numeric values will be compressed:
BinaryType indicates how Binary values will be stored:
MinLength and MaxLength represent the min and max byte[] value lengths for Binary values. If they are equal, then all values are of a fixed size, and can be addressed as DataOffset + (docID * length). Otherwise, the binary values are of variable size, and packed integer metadata (PackedVersion,BlockSize) is written for the addresses.
MissingOffset points to a byte[] containing a bitset of all documents that had a value for the field. If its -1, then there are no missing values.
Checksum contains the CRC32 checksum of all bytes in the .dvm file up until the checksum. This is used to verify integrity of the file on opening the index.
The DocValues data or .dvd file.
For DocValues field, this stores the actual per-document data (the heavy-lifting)
DocValues data (.dvd) --> Header,<NumericData | BinaryData | SortedData>NumFields,Footer
Byte
DataLength,AddressesFST<Int64>
PackedInts
MonotonicBlockPackedInts(blockSize=16k)
CodecFooter
Constructor and Description |
---|
Lucene49DocValuesFormat()
Sole Constructor
|
Modifier and Type | Method and Description |
---|---|
DocValuesConsumer |
fieldsConsumer(SegmentWriteState state)
Returns a
DocValuesConsumer to write docvalues to the
index. |
DocValuesProducer |
fieldsProducer(SegmentReadState state)
Returns a
DocValuesProducer to read docvalues from the index. |
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString
public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException
DocValuesFormat
DocValuesConsumer
to write docvalues to the
index.fieldsConsumer
in class DocValuesFormat
IOException
public DocValuesProducer fieldsProducer(SegmentReadState state) throws IOException
DocValuesFormat
DocValuesProducer
to read docvalues from the index.
NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.
fieldsProducer
in class DocValuesFormat
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.