public class BKDWriter extends Object implements Closeable
maxPointsInLeafNode
. The tree is
partially balanced, which means the leaf nodes will have
the requested maxPointsInLeafNode
except one that might have less.
Leaf nodes may straddle the two bottom levels of the binary tree.
Values that fall exactly on a cell boundary may be in either cell.
The number of dimensions can be 1 to 8, but every byte[] value is fixed length.
This consumes heap during writing: it allocates a Long[numLeaves]
,
a byte[numLeaves*(1+bytesPerDim)]
and then uses up to the specified
maxMBSortInHeap
heap space for writing.
NOTE: This can write at most Integer.MAX_VALUE * maxPointsInLeafNode
/ bytesPerDim
total points.
Modifier and Type | Field and Description |
---|---|
protected int |
bytesPerDim
How many bytes each value in each dimension takes.
|
static String |
CODEC_NAME |
static float |
DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) disk
|
static int |
DEFAULT_MAX_POINTS_IN_LEAF_NODE
Default maximum number of point in each leaf block
|
protected FixedBitSet |
docsSeen |
static int |
MAX_DIMS
Maximum number of index dimensions (2 * max index dimensions)
|
static int |
MAX_INDEX_DIMS
Maximum number of index dimensions
|
protected byte[] |
maxPackedValue
Maximum per-dim values, packed
|
protected int |
maxPointsInLeafNode |
protected byte[] |
minPackedValue
Minimum per-dim values, packed
|
protected int |
numDataDims
How many dimensions we are storing at the leaf (data) nodes
|
protected int |
numIndexDims
How many dimensions we are indexing in the internal nodes
|
protected int |
packedBytesLength
numDataDims * bytesPerDim
|
protected int |
packedIndexBytesLength
numIndexDims * bytesPerDim
|
protected long |
pointCount |
static int |
VERSION_CURRENT |
static int |
VERSION_LEAF_STORES_BOUNDS |
static int |
VERSION_LOW_CARDINALITY_LEAVES |
static int |
VERSION_META_FILE |
static int |
VERSION_SELECTIVE_INDEXING |
static int |
VERSION_START |
Constructor and Description |
---|
BKDWriter(int maxDoc,
Directory tempDir,
String tempFileNamePrefix,
int numDataDims,
int numIndexDims,
int bytesPerDim,
int maxPointsInLeafNode,
double maxMBSortInHeap,
long totalPointCount) |
Modifier and Type | Method and Description |
---|---|
void |
add(byte[] packedValue,
int docID) |
void |
close() |
Runnable |
finish(IndexOutput metaOut,
IndexOutput indexOut,
IndexOutput dataOut)
Writes the BKD tree to the provided
IndexOutput s and returns a Runnable that
writes the index of the tree if at least one point has been added, or null otherwise. |
Runnable |
merge(IndexOutput metaOut,
IndexOutput indexOut,
IndexOutput dataOut,
List<MergeState.DocMap> docMaps,
List<BKDReader> readers)
More efficient bulk-add for incoming
BKDReader s. |
protected int |
split(byte[] minPackedValue,
byte[] maxPackedValue,
int[] parentSplits)
Pick the next dimension to split.
|
static void |
verifyParams(int numDims,
int numIndexDims,
int maxPointsInLeafNode,
double maxMBSortInHeap,
long totalPointCount) |
Runnable |
writeField(IndexOutput metaOut,
IndexOutput indexOut,
IndexOutput dataOut,
String fieldName,
MutablePointValues reader)
Write a field from a
MutablePointValues . |
public static final String CODEC_NAME
public static final int VERSION_START
public static final int VERSION_LEAF_STORES_BOUNDS
public static final int VERSION_SELECTIVE_INDEXING
public static final int VERSION_LOW_CARDINALITY_LEAVES
public static final int VERSION_META_FILE
public static final int VERSION_CURRENT
public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
public static final int MAX_DIMS
public static final int MAX_INDEX_DIMS
protected final int numDataDims
protected final int numIndexDims
protected final int bytesPerDim
protected final int packedBytesLength
protected final int packedIndexBytesLength
protected final FixedBitSet docsSeen
protected final int maxPointsInLeafNode
protected final byte[] minPackedValue
protected final byte[] maxPackedValue
protected long pointCount
public BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDataDims, int numIndexDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount) throws IOException
IOException
public static void verifyParams(int numDims, int numIndexDims, int maxPointsInLeafNode, double maxMBSortInHeap, long totalPointCount)
public void add(byte[] packedValue, int docID) throws IOException
IOException
public Runnable writeField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, String fieldName, MutablePointValues reader) throws IOException
MutablePointValues
. This way of writing
points is faster than regular writes with add(byte[], int)
since
there is opportunity for reordering points before writing them to
disk. This method does not use transient disk in order to reorder points.IOException
public Runnable merge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, List<MergeState.DocMap> docMaps, List<BKDReader> readers) throws IOException
BKDReader
s. This does a merge sort of the already
sorted values and currently only works when numDims==1. This returns -1 if all documents containing
dimensional values were deleted.IOException
public Runnable finish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) throws IOException
IndexOutput
s and returns a Runnable
that
writes the index of the tree if at least one point has been added, or null
otherwise.IOException
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
IOException
protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)
minPackedValue
- the min values for all dimensionsmaxPackedValue
- the max values for all dimensionsparentSplits
- how many times each dim has been split on the parent levelsCopyright © 2000-2020 Apache Software Foundation. All Rights Reserved.