Class BKDWriter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class BKDWriter
    extends Object
    implements Closeable
    Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <= maxPointsInLeafNode. The tree is partially balanced, which means the leaf nodes will have the requested maxPointsInLeafNode except one that might have less. Leaf nodes may straddle the two bottom levels of the binary tree. Values that fall exactly on a cell boundary may be in either cell.

    The number of dimensions can be 1 to 8, but every byte[] value is fixed length.

    This consumes heap during writing: it allocates a Long[numLeaves], a byte[numLeaves*(1+bytesPerDim)] and then uses up to the specified maxMBSortInHeap heap space for writing.

    NOTE: This can write at most Integer.MAX_VALUE * maxPointsInLeafNode / bytesPerDim total points.

    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Field Detail

      • VERSION_LEAF_STORES_BOUNDS

        public static final int VERSION_LEAF_STORES_BOUNDS
        See Also:
        Constant Field Values
      • VERSION_SELECTIVE_INDEXING

        public static final int VERSION_SELECTIVE_INDEXING
        See Also:
        Constant Field Values
      • VERSION_LOW_CARDINALITY_LEAVES

        public static final int VERSION_LOW_CARDINALITY_LEAVES
        See Also:
        Constant Field Values
      • DEFAULT_MAX_POINTS_IN_LEAF_NODE

        public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
        Default maximum number of point in each leaf block
        See Also:
        Constant Field Values
      • DEFAULT_MAX_MB_SORT_IN_HEAP

        public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
        Default maximum heap to use, before spilling to (slower) disk
        See Also:
        Constant Field Values
      • MAX_DIMS

        public static final int MAX_DIMS
        Maximum number of index dimensions (2 * max index dimensions)
        See Also:
        Constant Field Values
      • MAX_INDEX_DIMS

        public static final int MAX_INDEX_DIMS
        Maximum number of index dimensions
        See Also:
        Constant Field Values
      • numDataDims

        protected final int numDataDims
        How many dimensions we are storing at the leaf (data) nodes
      • numIndexDims

        protected final int numIndexDims
        How many dimensions we are indexing in the internal nodes
      • bytesPerDim

        protected final int bytesPerDim
        How many bytes each value in each dimension takes.
      • packedBytesLength

        protected final int packedBytesLength
        numDataDims * bytesPerDim
      • packedIndexBytesLength

        protected final int packedIndexBytesLength
        numIndexDims * bytesPerDim
      • maxPointsInLeafNode

        protected final int maxPointsInLeafNode
      • minPackedValue

        protected final byte[] minPackedValue
        Minimum per-dim values, packed
      • maxPackedValue

        protected final byte[] maxPackedValue
        Maximum per-dim values, packed
      • pointCount

        protected long pointCount
    • Constructor Detail

      • BKDWriter

        public BKDWriter​(int maxDoc,
                         Directory tempDir,
                         String tempFileNamePrefix,
                         int numDataDims,
                         int numIndexDims,
                         int bytesPerDim,
                         int maxPointsInLeafNode,
                         double maxMBSortInHeap,
                         long totalPointCount)
                  throws IOException
        Throws:
        IOException
    • Method Detail

      • verifyParams

        public static void verifyParams​(int numDims,
                                        int numIndexDims,
                                        int maxPointsInLeafNode,
                                        double maxMBSortInHeap,
                                        long totalPointCount)
      • split

        protected int split​(byte[] minPackedValue,
                            byte[] maxPackedValue,
                            int[] parentSplits)
        Pick the next dimension to split.
        Parameters:
        minPackedValue - the min values for all dimensions
        maxPackedValue - the max values for all dimensions
        parentSplits - how many times each dim has been split on the parent levels
        Returns:
        the dimension to split