Package org.apache.lucene.util.bkd
Class BKDWriter
- java.lang.Object
-
- org.apache.lucene.util.bkd.BKDWriter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class BKDWriter extends Object implements Closeable
Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <=maxPointsInLeafNode
. The tree is partially balanced, which means the leaf nodes will have the requestedmaxPointsInLeafNode
except one that might have less. Leaf nodes may straddle the two bottom levels of the binary tree. Values that fall exactly on a cell boundary may be in either cell.The number of dimensions can be 1 to 8, but every byte[] value is fixed length.
This consumes heap during writing: it allocates a
Long[numLeaves]
, abyte[numLeaves*(1+bytesPerDim)]
and then uses up to the specifiedmaxMBSortInHeap
heap space for writing.NOTE: This can write at most Integer.MAX_VALUE *
maxPointsInLeafNode
/ bytesPerDim total points.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Field Summary
Fields Modifier and Type Field Description protected int
bytesPerDim
How many bytes each value in each dimension takes.static String
CODEC_NAME
static float
DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) diskstatic int
DEFAULT_MAX_POINTS_IN_LEAF_NODE
Default maximum number of point in each leaf blockprotected FixedBitSet
docsSeen
static int
MAX_DIMS
Maximum number of index dimensions (2 * max index dimensions)static int
MAX_INDEX_DIMS
Maximum number of index dimensionsprotected byte[]
maxPackedValue
Maximum per-dim values, packedprotected int
maxPointsInLeafNode
protected byte[]
minPackedValue
Minimum per-dim values, packedprotected int
numDataDims
How many dimensions we are storing at the leaf (data) nodesprotected int
numIndexDims
How many dimensions we are indexing in the internal nodesprotected int
packedBytesLength
numDataDims * bytesPerDimprotected int
packedIndexBytesLength
numIndexDims * bytesPerDimprotected long
pointCount
static int
VERSION_CURRENT
static int
VERSION_LEAF_STORES_BOUNDS
static int
VERSION_LOW_CARDINALITY_LEAVES
static int
VERSION_META_FILE
static int
VERSION_SELECTIVE_INDEXING
static int
VERSION_START
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(byte[] packedValue, int docID)
void
close()
Runnable
finish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut)
Writes the BKD tree to the providedIndexOutput
s and returns aRunnable
that writes the index of the tree if at least one point has been added, ornull
otherwise.Runnable
merge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<BKDReader> readers)
More efficient bulk-add for incomingBKDReader
s.protected int
split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)
Pick the next dimension to split.static void
verifyParams(int numDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)
Runnable
writeField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointValues reader)
Write a field from aMutablePointValues
.
-
-
-
Field Detail
-
CODEC_NAME
public static final String CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_LEAF_STORES_BOUNDS
public static final int VERSION_LEAF_STORES_BOUNDS
- See Also:
- Constant Field Values
-
VERSION_SELECTIVE_INDEXING
public static final int VERSION_SELECTIVE_INDEXING
- See Also:
- Constant Field Values
-
VERSION_LOW_CARDINALITY_LEAVES
public static final int VERSION_LOW_CARDINALITY_LEAVES
- See Also:
- Constant Field Values
-
VERSION_META_FILE
public static final int VERSION_META_FILE
- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
DEFAULT_MAX_POINTS_IN_LEAF_NODE
public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
Default maximum number of point in each leaf block- See Also:
- Constant Field Values
-
DEFAULT_MAX_MB_SORT_IN_HEAP
public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) disk- See Also:
- Constant Field Values
-
MAX_DIMS
public static final int MAX_DIMS
Maximum number of index dimensions (2 * max index dimensions)- See Also:
- Constant Field Values
-
MAX_INDEX_DIMS
public static final int MAX_INDEX_DIMS
Maximum number of index dimensions- See Also:
- Constant Field Values
-
numDataDims
protected final int numDataDims
How many dimensions we are storing at the leaf (data) nodes
-
numIndexDims
protected final int numIndexDims
How many dimensions we are indexing in the internal nodes
-
bytesPerDim
protected final int bytesPerDim
How many bytes each value in each dimension takes.
-
packedBytesLength
protected final int packedBytesLength
numDataDims * bytesPerDim
-
packedIndexBytesLength
protected final int packedIndexBytesLength
numIndexDims * bytesPerDim
-
docsSeen
protected final FixedBitSet docsSeen
-
maxPointsInLeafNode
protected final int maxPointsInLeafNode
-
minPackedValue
protected final byte[] minPackedValue
Minimum per-dim values, packed
-
maxPackedValue
protected final byte[] maxPackedValue
Maximum per-dim values, packed
-
pointCount
protected long pointCount
-
-
Constructor Detail
-
BKDWriter
public BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount) throws IOException
- Throws:
IOException
-
-
Method Detail
-
verifyParams
public static void verifyParams(int numDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)
-
add
public void add(byte[] packedValue, int docID) throws IOException
- Throws:
IOException
-
writeField
public Runnable writeField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointValues reader) throws IOException
Write a field from aMutablePointValues
. This way of writing points is faster than regular writes withadd(byte[], int)
since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.- Throws:
IOException
-
merge
public Runnable merge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<BKDReader> readers) throws IOException
More efficient bulk-add for incomingBKDReader
s. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.- Throws:
IOException
-
finish
public Runnable finish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) throws IOException
Writes the BKD tree to the providedIndexOutput
s and returns aRunnable
that writes the index of the tree if at least one point has been added, ornull
otherwise.- Throws:
IOException
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
split
protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)
Pick the next dimension to split.- Parameters:
minPackedValue
- the min values for all dimensionsmaxPackedValue
- the max values for all dimensionsparentSplits
- how many times each dim has been split on the parent levels- Returns:
- the dimension to split
-
-