Package org.apache.lucene.util.bkd
Class BKDWriter
- java.lang.Object
-
- org.apache.lucene.util.bkd.BKDWriter
-
- All Implemented Interfaces:
Closeable,AutoCloseable
public class BKDWriter extends Object implements Closeable
Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <=maxPointsInLeafNode. The tree is fully balanced, which means the leaf nodes will have between 50% and 100% of the requestedmaxPointsInLeafNode. Values that fall exactly on a cell boundary may be in either cell.The number of dimensions can be 1 to 8, but every byte[] value is fixed length.
See this paper for details.
This consumes heap during writing: it allocates a
LongBitSet(numPoints), and then uses up to the specifiedmaxMBSortInHeapheap space for writing.NOTE: This can write at most Integer.MAX_VALUE *
maxPointsInLeafNodetotal points.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Field Summary
Fields Modifier and Type Field Description protected intbytesPerDimHow many bytes each value in each dimension takes.static StringCODEC_NAMEstatic floatDEFAULT_MAX_MB_SORT_IN_HEAPDefault maximum heap to use, before spilling to (slower) diskstatic intDEFAULT_MAX_POINTS_IN_LEAF_NODEDefault maximum number of point in each leaf blockprotected FixedBitSetdocsSeenprotected booleanlongOrdstrue if we have so many values that we must write ords using long (8 bytes) instead of int (4 bytes)static intMAX_DIMSMaximum number of dimensionsprotected byte[]maxPackedValueMaximum per-dim values, packedprotected intmaxPointsInLeafNodeprotected byte[]minPackedValueMinimum per-dim values, packedprotected intnumDataDimsHow many dimensions we are storing at the leaf (data) nodesprotected intnumIndexDimsHow many dimensions we are indexing in the internal nodesprotected OfflineSorter.BufferSizeofflineSorterBufferMBHow much heap OfflineSorter is allowed to useprotected intofflineSorterMaxTempFilesHow much heap OfflineSorter is allowed to useprotected intpackedBytesLengthnumDataDims * bytesPerDimprotected intpackedIndexBytesLengthnumIndexDims * bytesPerDimprotected longpointCountprotected booleansingleValuePerDocTrue if every document has at most one value.static intVERSION_COMPRESSED_DOC_IDSstatic intVERSION_COMPRESSED_VALUESstatic intVERSION_CURRENTstatic intVERSION_IMPLICIT_SPLIT_DIM_1Dstatic intVERSION_LEAF_STORES_BOUNDSstatic intVERSION_PACKED_INDEXstatic intVERSION_SELECTIVE_INDEXINGstatic intVERSION_START
-
Constructor Summary
Constructors Modifier Constructor Description BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount, boolean singleValuePerDoc)protectedBKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount, boolean singleValuePerDoc, boolean longOrds, long offlineSorterBufferMB, int offlineSorterMaxTempFiles)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(byte[] packedValue, int docID)voidclose()longfinish(IndexOutput out)Writes the BKD tree to the providedIndexOutputand returns the file offset where index was written.longgetPointCount()How many points have been added so farlongmerge(IndexOutput out, List<MergeState.DocMap> docMaps, List<BKDReader> readers)More efficient bulk-add for incomingBKDReaders.protected intsplit(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)Pick the next dimension to split.static voidverifyParams(int numDataDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)longwriteField(IndexOutput out, String fieldName, MutablePointValues reader)Write a field from aMutablePointValues.
-
-
-
Field Detail
-
CODEC_NAME
public static final String CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_COMPRESSED_DOC_IDS
public static final int VERSION_COMPRESSED_DOC_IDS
- See Also:
- Constant Field Values
-
VERSION_COMPRESSED_VALUES
public static final int VERSION_COMPRESSED_VALUES
- See Also:
- Constant Field Values
-
VERSION_IMPLICIT_SPLIT_DIM_1D
public static final int VERSION_IMPLICIT_SPLIT_DIM_1D
- See Also:
- Constant Field Values
-
VERSION_PACKED_INDEX
public static final int VERSION_PACKED_INDEX
- See Also:
- Constant Field Values
-
VERSION_LEAF_STORES_BOUNDS
public static final int VERSION_LEAF_STORES_BOUNDS
- See Also:
- Constant Field Values
-
VERSION_SELECTIVE_INDEXING
public static final int VERSION_SELECTIVE_INDEXING
- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
DEFAULT_MAX_POINTS_IN_LEAF_NODE
public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
Default maximum number of point in each leaf block- See Also:
- Constant Field Values
-
DEFAULT_MAX_MB_SORT_IN_HEAP
public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) disk- See Also:
- Constant Field Values
-
MAX_DIMS
public static final int MAX_DIMS
Maximum number of dimensions- See Also:
- Constant Field Values
-
numDataDims
protected final int numDataDims
How many dimensions we are storing at the leaf (data) nodes
-
numIndexDims
protected final int numIndexDims
How many dimensions we are indexing in the internal nodes
-
bytesPerDim
protected final int bytesPerDim
How many bytes each value in each dimension takes.
-
packedBytesLength
protected final int packedBytesLength
numDataDims * bytesPerDim
-
packedIndexBytesLength
protected final int packedIndexBytesLength
numIndexDims * bytesPerDim
-
docsSeen
protected final FixedBitSet docsSeen
-
maxPointsInLeafNode
protected final int maxPointsInLeafNode
-
minPackedValue
protected final byte[] minPackedValue
Minimum per-dim values, packed
-
maxPackedValue
protected final byte[] maxPackedValue
Maximum per-dim values, packed
-
pointCount
protected long pointCount
-
longOrds
protected final boolean longOrds
true if we have so many values that we must write ords using long (8 bytes) instead of int (4 bytes)
-
singleValuePerDoc
protected final boolean singleValuePerDoc
True if every document has at most one value. We specialize this case by not bothering to store the ord since it's redundant with docID.
-
offlineSorterBufferMB
protected final OfflineSorter.BufferSize offlineSorterBufferMB
How much heap OfflineSorter is allowed to use
-
offlineSorterMaxTempFiles
protected final int offlineSorterMaxTempFiles
How much heap OfflineSorter is allowed to use
-
-
Constructor Detail
-
BKDWriter
public BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount, boolean singleValuePerDoc) throws IOException- Throws:
IOException
-
BKDWriter
protected BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount, boolean singleValuePerDoc, boolean longOrds, long offlineSorterBufferMB, int offlineSorterMaxTempFiles) throws IOException- Throws:
IOException
-
-
Method Detail
-
verifyParams
public static void verifyParams(int numDataDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)
-
add
public void add(byte[] packedValue, int docID) throws IOException- Throws:
IOException
-
getPointCount
public long getPointCount()
How many points have been added so far
-
writeField
public long writeField(IndexOutput out, String fieldName, MutablePointValues reader) throws IOException
Write a field from aMutablePointValues. This way of writing points is faster than regular writes withadd(byte[], int)since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.- Throws:
IOException
-
merge
public long merge(IndexOutput out, List<MergeState.DocMap> docMaps, List<BKDReader> readers) throws IOException
More efficient bulk-add for incomingBKDReaders. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.- Throws:
IOException
-
finish
public long finish(IndexOutput out) throws IOException
Writes the BKD tree to the providedIndexOutputand returns the file offset where index was written.- Throws:
IOException
-
close
public void close() throws IOException- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
split
protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)Pick the next dimension to split.- Parameters:
minPackedValue- the min values for all dimensionsmaxPackedValue- the max values for all dimensionsparentSplits- how many times each dim has been split on the parent levels- Returns:
- the dimension to split
-
-