Class Lucene60PointsFormat


  • public class Lucene60PointsFormat
    extends PointsFormat
    Lucene 6.0 point format, which encodes dimensional values in a block KD-tree structure for fast 1D range and N dimensional shape intersection filtering. See this paper for details.

    This data structure is written as a series of blocks on disk, with an in-memory perfectly balanced binary tree of split values referencing those blocks at the leaves.

    The .dim file has both blocks and the index split values, for each field. The file starts with CodecUtil.writeIndexHeader(org.apache.lucene.store.DataOutput, java.lang.String, int, byte[], java.lang.String).

    The blocks are written like this:

    • count (vInt)
    • delta-docID (vInt) count (delta coded docIDs, in sorted order)
    • packedValuecount (the byte[] value of each dimension packed into a single byte[])

    After all blocks for a field are written, then the index is written:

    • numDims (vInt)
    • maxPointsInLeafNode (vInt)
    • bytesPerDim (vInt)
    • count (vInt)
    • packed index (byte[])

    The packed index uses hierarchical delta and prefix coding to compactly encode the file pointer for all leaf blocks, once the tree is traversed, as well as the split dimension and split value for each inner node of the tree.

    After all fields blocks + index data are written, CodecUtil.writeFooter(org.apache.lucene.store.IndexOutput) writes the checksum.

    The .dii file records the file pointer in the .dim file where each field's index data was written. It starts with CodecUtil.writeIndexHeader(org.apache.lucene.store.DataOutput, java.lang.String, int, byte[], java.lang.String), then has:

    • fieldCount (vInt)
    • (fieldNumber (vInt), fieldFilePointer (vLong))fieldCount

    After all fields blocks + index data are written, CodecUtil.writeFooter(org.apache.lucene.store.IndexOutput) writes the checksum.

    WARNING: This API is experimental and might change in incompatible ways in the next release.