Package org.apache.lucene.util.bkd
Class BKDWriter
java.lang.Object
org.apache.lucene.util.bkd.BKDWriter
- All Implemented Interfaces:
Closeable
,AutoCloseable
Recursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and
smaller N-dim rectangles (cells) until the number of points in a given rectangle is <=
config.maxPointsInLeafNode
. The tree is partially balanced, which means the leaf nodes
will have the requested config.maxPointsInLeafNode
except one that might have less.
Leaf nodes may straddle the two bottom levels of the binary tree. Values that fall exactly on a
cell boundary may be in either cell.
The number of dimensions can be 1 to 8, but every byte[] value is fixed length.
This consumes heap during writing: it allocates a Long[numLeaves]
, a
byte[numLeaves*(1+config.bytesPerDim)]
and then uses up to the specified
maxMBSortInHeap
heap space for writing.
NOTE: This can write at most Integer.MAX_VALUE * config.maxPointsInLeafNode
/ config.bytesPerDim total points.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
protected final BKDConfig
BKD tree configurationstatic final float
Default maximum heap to use, before spilling to (slower) diskprotected final FixedBitSet
protected final byte[]
Maximum per-dim values, packedprotected final byte[]
Minimum per-dim values, packedprotected long
static final int
static final int
static final int
static final int
static final int
static final int
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
add
(byte[] packedValue, int docID) void
close()
finish
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) Writes the BKD tree to the providedIndexOutput
s and returns aRunnable
that writes the index of the tree if at least one point has been added, ornull
otherwise.merge
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<PointValues> readers) More efficient bulk-add for incomingPointValues
s.protected int
split
(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits) Pick the next dimension to split.writeField
(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) Write a field from aMutablePointTree
.
-
Field Details
-
CODEC_NAME
- See Also:
-
VERSION_START
public static final int VERSION_START- See Also:
-
VERSION_LEAF_STORES_BOUNDS
public static final int VERSION_LEAF_STORES_BOUNDS- See Also:
-
VERSION_SELECTIVE_INDEXING
public static final int VERSION_SELECTIVE_INDEXING- See Also:
-
VERSION_LOW_CARDINALITY_LEAVES
public static final int VERSION_LOW_CARDINALITY_LEAVES- See Also:
-
VERSION_META_FILE
public static final int VERSION_META_FILE- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENT- See Also:
-
DEFAULT_MAX_MB_SORT_IN_HEAP
public static final float DEFAULT_MAX_MB_SORT_IN_HEAPDefault maximum heap to use, before spilling to (slower) disk- See Also:
-
config
BKD tree configuration -
docsSeen
-
minPackedValue
protected final byte[] minPackedValueMinimum per-dim values, packed -
maxPackedValue
protected final byte[] maxPackedValueMaximum per-dim values, packed -
pointCount
protected long pointCount
-
-
Constructor Details
-
BKDWriter
-
-
Method Details
-
add
- Throws:
IOException
-
writeField
public Runnable writeField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointTree reader) throws IOException Write a field from aMutablePointTree
. This way of writing points is faster than regular writes withadd(byte[], int)
since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.- Throws:
IOException
-
merge
public Runnable merge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<PointValues> readers) throws IOException More efficient bulk-add for incomingPointValues
s. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.- Throws:
IOException
-
finish
public Runnable finish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) throws IOException Writes the BKD tree to the providedIndexOutput
s and returns aRunnable
that writes the index of the tree if at least one point has been added, ornull
otherwise.- Throws:
IOException
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
split
protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits) Pick the next dimension to split.- Parameters:
minPackedValue
- the min values for all dimensionsmaxPackedValue
- the max values for all dimensionsparentSplits
- how many times each dim has been split on the parent levels- Returns:
- the dimension to split
-