Package org.apache.lucene.util.bkd
Class BKDWriter
- java.lang.Object
-
- org.apache.lucene.util.bkd.BKDWriter
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class BKDWriter extends Object implements Closeable
Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <=maxPointsInLeafNode. The tree is fully balanced, which means the leaf nodes will have between 50% and 100% of the requestedmaxPointsInLeafNode. Values that fall exactly on a cell boundary may be in either cell.The number of dimensions can be 1 to 8, but every byte[] value is fixed length.
This consumes heap during writing: it allocates a
Long[numLeaves], abyte[numLeaves*(1+bytesPerDim)]and then uses up to the specifiedmaxMBSortInHeapheap space for writing.NOTE: This can write at most Integer.MAX_VALUE *
maxPointsInLeafNode/ (1+bytesPerDim) total points.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Field Summary
Fields Modifier and Type Field Description protected intbytesPerDimHow many bytes each value in each dimension takes.static StringCODEC_NAMEstatic floatDEFAULT_MAX_MB_SORT_IN_HEAPDefault maximum heap to use, before spilling to (slower) diskstatic intDEFAULT_MAX_POINTS_IN_LEAF_NODEDefault maximum number of point in each leaf blockprotected FixedBitSetdocsSeenstatic intMAX_DIMSMaximum number of dimensionsprotected byte[]maxPackedValueMaximum per-dim values, packedprotected intmaxPointsInLeafNodeprotected byte[]minPackedValueMinimum per-dim values, packedprotected intnumDataDimsHow many dimensions we are storing at the leaf (data) nodesprotected intnumIndexDimsHow many dimensions we are indexing in the internal nodesprotected intpackedBytesLengthnumDataDims * bytesPerDimprotected intpackedIndexBytesLengthnumIndexDims * bytesPerDimprotected longpointCountstatic intVERSION_CURRENTstatic intVERSION_LEAF_STORES_BOUNDSstatic intVERSION_SELECTIVE_INDEXINGstatic intVERSION_START
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(byte[] packedValue, int docID)voidclose()longfinish(IndexOutput out)Writes the BKD tree to the providedIndexOutputand returns the file offset where index was written.longgetPointCount()How many points have been added so farlongmerge(IndexOutput out, List<MergeState.DocMap> docMaps, List<BKDReader> readers)More efficient bulk-add for incomingBKDReaders.protected intsplit(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)Pick the next dimension to split.static voidverifyParams(int numDataDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)longwriteField(IndexOutput out, String fieldName, MutablePointValues reader)Write a field from aMutablePointValues.
-
-
-
Field Detail
-
CODEC_NAME
public static final String CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_LEAF_STORES_BOUNDS
public static final int VERSION_LEAF_STORES_BOUNDS
- See Also:
- Constant Field Values
-
VERSION_SELECTIVE_INDEXING
public static final int VERSION_SELECTIVE_INDEXING
- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
DEFAULT_MAX_POINTS_IN_LEAF_NODE
public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
Default maximum number of point in each leaf block- See Also:
- Constant Field Values
-
DEFAULT_MAX_MB_SORT_IN_HEAP
public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) disk- See Also:
- Constant Field Values
-
MAX_DIMS
public static final int MAX_DIMS
Maximum number of dimensions- See Also:
- Constant Field Values
-
numDataDims
protected final int numDataDims
How many dimensions we are storing at the leaf (data) nodes
-
numIndexDims
protected final int numIndexDims
How many dimensions we are indexing in the internal nodes
-
bytesPerDim
protected final int bytesPerDim
How many bytes each value in each dimension takes.
-
packedBytesLength
protected final int packedBytesLength
numDataDims * bytesPerDim
-
packedIndexBytesLength
protected final int packedIndexBytesLength
numIndexDims * bytesPerDim
-
docsSeen
protected final FixedBitSet docsSeen
-
maxPointsInLeafNode
protected final int maxPointsInLeafNode
-
minPackedValue
protected final byte[] minPackedValue
Minimum per-dim values, packed
-
maxPackedValue
protected final byte[] maxPackedValue
Maximum per-dim values, packed
-
pointCount
protected long pointCount
-
-
Constructor Detail
-
BKDWriter
public BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount) throws IOException- Throws:
IOException
-
-
Method Detail
-
verifyParams
public static void verifyParams(int numDataDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)
-
add
public void add(byte[] packedValue, int docID) throws IOException- Throws:
IOException
-
getPointCount
public long getPointCount()
How many points have been added so far
-
writeField
public long writeField(IndexOutput out, String fieldName, MutablePointValues reader) throws IOException
Write a field from aMutablePointValues. This way of writing points is faster than regular writes withadd(byte[], int)since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.- Throws:
IOException
-
merge
public long merge(IndexOutput out, List<MergeState.DocMap> docMaps, List<BKDReader> readers) throws IOException
More efficient bulk-add for incomingBKDReaders. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.- Throws:
IOException
-
finish
public long finish(IndexOutput out) throws IOException
Writes the BKD tree to the providedIndexOutputand returns the file offset where index was written.- Throws:
IOException
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
split
protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)Pick the next dimension to split.- Parameters:
minPackedValue- the min values for all dimensionsmaxPackedValue- the max values for all dimensionsparentSplits- how many times each dim has been split on the parent levels- Returns:
- the dimension to split
-
-