public class Lucene40DocValuesFormat extends DocValuesFormat
Files:
compound container
compound entries
There are several many types of DocValues
with different encodings.
From the perspective of filenames, all types store their values in .dat
entries within the compound file. In the case of dereferenced/sorted types, the .dat
actually contains only the unique values, and an additional .idx file contains
pointers to these unique values.
VAR_INTS
.dat --> Header, PackedType, MinValue,
DefaultValue, PackedStreamFIXED_INTS_8
.dat --> Header, ValueSize,
Byte
maxdocFIXED_INTS_16
.dat --> Header, ValueSize,
Short
maxdocFIXED_INTS_32
.dat --> Header, ValueSize,
Int32
maxdocFIXED_INTS_64
.dat --> Header, ValueSize,
Int64
maxdocFLOAT_32
.dat --> Header, ValueSize,
Float32maxdocFLOAT_64
.dat --> Header, ValueSize,
Float64maxdocBYTES_FIXED_STRAIGHT
.dat --> Header, ValueSize,
(Byte
* ValueSize)maxdocBYTES_VAR_STRAIGHT
.idx --> Header, MaxAddress,
AddressesBYTES_VAR_STRAIGHT
.dat --> Header, TotalBytes,
Addresses, (Byte
*
variable ValueSize)maxdocBYTES_FIXED_DEREF
.idx --> Header, NumValues,
AddressesBYTES_FIXED_DEREF
.dat --> Header, ValueSize,
(Byte
* ValueSize)NumValuesBYTES_VAR_DEREF
.idx --> Header, TotalVarBytes,
AddressesBYTES_VAR_DEREF
.dat --> Header,
(LengthPrefix + Byte
* variable ValueSize)NumValuesBYTES_FIXED_SORTED
.idx --> Header, NumValues,
OrdinalsBYTES_FIXED_SORTED
.dat --> Header, ValueSize,
(Byte
* ValueSize)NumValuesBYTES_VAR_SORTED
.idx --> Header, TotalVarBytes,
Addresses, OrdinalsBYTES_VAR_SORTED
.dat --> Header,
(Byte
* variable ValueSize)NumValuesCodecHeader
Byte
Int64
PackedInts
Int32
Float.floatToRawIntBits(float)
then written as Int32
Double.doubleToRawLongBits(double)
then written as Int64
VLong
Int64
VInt
(maximum
of 2 bytes)VInt
(maximum of 2 bytes).Header+ValueSize+(ordinal*ValueSize)
because the byte length is fixed.
In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but
an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To
determine the length, ord+1's address is looked up as well.BYTES_VAR_STRAIGHT
in contrast to other straight
variants uses a .idx file to improve lookup perfromance. In contrast to
BYTES_VAR_DEREF
it doesn't apply deduplication of the document values.
Constructor and Description |
---|
Lucene40DocValuesFormat()
Sole constructor.
|
Modifier and Type | Method and Description |
---|---|
PerDocConsumer |
docsConsumer(PerDocWriteState state)
Consumes (writes) doc values during indexing.
|
PerDocProducer |
docsProducer(SegmentReadState state)
Produces (reads) doc values during reading/searching.
|
public PerDocConsumer docsConsumer(PerDocWriteState state) throws IOException
DocValuesFormat
docsConsumer
in class DocValuesFormat
IOException
public PerDocProducer docsProducer(SegmentReadState state) throws IOException
DocValuesFormat
docsProducer
in class DocValuesFormat
IOException
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.