Class BKDWriter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public class BKDWriter
    extends Object
    implements Closeable
    Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <= maxPointsInLeafNode. The tree is fully balanced, which means the leaf nodes will have between 50% and 100% of the requested maxPointsInLeafNode. Values that fall exactly on a cell boundary may be in either cell.

    The number of dimensions can be 1 to 8, but every byte[] value is fixed length.

    This consumes heap during writing: it allocates a Long[numLeaves], a byte[numLeaves*(1+bytesPerDim)] and then uses up to the specified maxMBSortInHeap heap space for writing.

    NOTE: This can write at most Integer.MAX_VALUE * maxPointsInLeafNode / (1+bytesPerDim) total points.

    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Field Detail

      • VERSION_LEAF_STORES_BOUNDS

        public static final int VERSION_LEAF_STORES_BOUNDS
        See Also:
        Constant Field Values
      • VERSION_SELECTIVE_INDEXING

        public static final int VERSION_SELECTIVE_INDEXING
        See Also:
        Constant Field Values
      • VERSION_LOW_CARDINALITY_LEAVES

        public static final int VERSION_LOW_CARDINALITY_LEAVES
        See Also:
        Constant Field Values
      • DEFAULT_MAX_POINTS_IN_LEAF_NODE

        public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
        Default maximum number of point in each leaf block
        See Also:
        Constant Field Values
      • DEFAULT_MAX_MB_SORT_IN_HEAP

        public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
        Default maximum heap to use, before spilling to (slower) disk
        See Also:
        Constant Field Values
      • MAX_DIMS

        public static final int MAX_DIMS
        Maximum number of index dimensions (2 * max index dimensions)
        See Also:
        Constant Field Values
      • MAX_INDEX_DIMS

        public static final int MAX_INDEX_DIMS
        Maximum number of index dimensions
        See Also:
        Constant Field Values
      • numDataDims

        protected final int numDataDims
        How many dimensions we are storing at the leaf (data) nodes
      • numIndexDims

        protected final int numIndexDims
        How many dimensions we are indexing in the internal nodes
      • bytesPerDim

        protected final int bytesPerDim
        How many bytes each value in each dimension takes.
      • packedBytesLength

        protected final int packedBytesLength
        numDataDims * bytesPerDim
      • packedIndexBytesLength

        protected final int packedIndexBytesLength
        numIndexDims * bytesPerDim
      • maxPointsInLeafNode

        protected final int maxPointsInLeafNode
      • minPackedValue

        protected final byte[] minPackedValue
        Minimum per-dim values, packed
      • maxPackedValue

        protected final byte[] maxPackedValue
        Maximum per-dim values, packed
      • pointCount

        protected long pointCount
    • Constructor Detail

      • BKDWriter

        public BKDWriter​(int maxDoc,
                         Directory tempDir,
                         String tempFileNamePrefix,
                         int numDataDims,
                         int numIndexDims,
                         int bytesPerDim,
                         int maxPointsInLeafNode,
                         double maxMBSortInHeap,
                         long totalPointCount)
                  throws IOException
        Throws:
        IOException
    • Method Detail

      • verifyParams

        public static void verifyParams​(int numDims,
                                        int numIndexDims,
                                        int maxPointsInLeafNode,
                                        double maxMBSortInHeap,
                                        long totalPointCount)
      • getPointCount

        public long getPointCount()
        How many points have been added so far
      • split

        protected int split​(byte[] minPackedValue,
                            byte[] maxPackedValue,
                            int[] parentSplits)
        Pick the next dimension to split.
        Parameters:
        minPackedValue - the min values for all dimensions
        maxPackedValue - the max values for all dimensions
        parentSplits - how many times each dim has been split on the parent levels
        Returns:
        the dimension to split