public class Lucene40DocValuesFormat extends DocValuesFormat
Files:
compound containercompound entries
There are several many types of DocValues with different encodings.
From the perspective of filenames, all types store their values in .dat
entries within the compound file. In the case of dereferenced/sorted types, the .dat
actually contains only the unique values, and an additional .idx file contains
pointers to these unique values.
VAR_INTS .dat --> Header, PackedType, MinValue,
DefaultValue, PackedStreamFIXED_INTS_8 .dat --> Header, ValueSize,
BytemaxdocFIXED_INTS_16 .dat --> Header, ValueSize,
ShortmaxdocFIXED_INTS_32 .dat --> Header, ValueSize,
Int32maxdocFIXED_INTS_64 .dat --> Header, ValueSize,
Int64maxdocFLOAT_32 .dat --> Header, ValueSize,
Float32maxdocFLOAT_64 .dat --> Header, ValueSize,
Float64maxdocBYTES_FIXED_STRAIGHT .dat --> Header, ValueSize,
(Byte * ValueSize)maxdocBYTES_VAR_STRAIGHT .idx --> Header, MaxAddress,
AddressesBYTES_VAR_STRAIGHT .dat --> Header, TotalBytes,
Addresses, (Byte *
variable ValueSize)maxdocBYTES_FIXED_DEREF .idx --> Header, NumValues,
AddressesBYTES_FIXED_DEREF .dat --> Header, ValueSize,
(Byte * ValueSize)NumValuesBYTES_VAR_DEREF .idx --> Header, TotalVarBytes,
AddressesBYTES_VAR_DEREF .dat --> Header,
(LengthPrefix + Byte * variable ValueSize)NumValuesBYTES_FIXED_SORTED .idx --> Header, NumValues,
OrdinalsBYTES_FIXED_SORTED .dat --> Header, ValueSize,
(Byte * ValueSize)NumValuesBYTES_VAR_SORTED .idx --> Header, TotalVarBytes,
Addresses, OrdinalsBYTES_VAR_SORTED .dat --> Header,
(Byte * variable ValueSize)NumValuesCodecHeaderByteInt64PackedIntsInt32Float.floatToRawIntBits(float)
then written as Int32Double.doubleToRawLongBits(double)
then written as Int64VLongInt64VInt (maximum
of 2 bytes)VInt
(maximum of 2 bytes).Header+ValueSize+(ordinal*ValueSize) because the byte length is fixed.
In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but
an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To
determine the length, ord+1's address is looked up as well.BYTES_VAR_STRAIGHT in contrast to other straight
variants uses a .idx file to improve lookup perfromance. In contrast to
BYTES_VAR_DEREF it doesn't apply deduplication of the document values.
| Constructor and Description |
|---|
Lucene40DocValuesFormat()
Sole constructor.
|
| Modifier and Type | Method and Description |
|---|---|
PerDocConsumer |
docsConsumer(PerDocWriteState state)
Consumes (writes) doc values during indexing.
|
PerDocProducer |
docsProducer(SegmentReadState state)
Produces (reads) doc values during reading/searching.
|
public PerDocConsumer docsConsumer(PerDocWriteState state) throws IOException
DocValuesFormatdocsConsumer in class DocValuesFormatIOExceptionpublic PerDocProducer docsProducer(SegmentReadState state) throws IOException
DocValuesFormatdocsProducer in class DocValuesFormatIOExceptionCopyright © 2000-2012 Apache Software Foundation. All Rights Reserved.