public class BKDWriter extends Object implements Closeable
maxPointsInLeafNode
. The tree is
fully balanced, which means the leaf nodes will have between 50% and 100% of
the requested maxPointsInLeafNode
. Values that fall exactly
on a cell boundary may be in either cell.
The number of dimensions can be 1 to 255, but every byte[] value is fixed length.
See this paper for details.
This consumes heap during writing: it allocates a LongBitSet(numPoints)
,
and then uses up to the specified maxMBSortInHeap
heap space for writing.
NOTE: This can write at most Integer.MAX_VALUE * maxPointsInLeafNode
total points, and
Modifier and Type | Field and Description |
---|---|
protected int |
bytesPerDim
How many bytes each value in each dimension takes.
|
static String |
CODEC_NAME |
static float |
DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) disk
|
static int |
DEFAULT_MAX_POINTS_IN_LEAF_NODE
Default maximum number of point in each leaf block
|
protected FixedBitSet |
docsSeen |
static int |
MAX_DIMS
Maximum number of dimensions
|
protected byte[] |
maxPackedValue
Maximum per-dim values, packed
|
protected int |
maxPointsInLeafNode |
protected byte[] |
minPackedValue
Minimum per-dim values, packed
|
protected int |
numDims
How many dimensions we are indexing
|
protected int |
packedBytesLength
numDims * bytesPerDim
|
protected long |
pointCount |
static int |
VERSION_CURRENT |
static int |
VERSION_START |
Constructor and Description |
---|
BKDWriter(int maxDoc,
Directory tempDir,
String tempFileNamePrefix,
int numDims,
int bytesPerDim) |
BKDWriter(int maxDoc,
Directory tempDir,
String tempFileNamePrefix,
int numDims,
int bytesPerDim,
int maxPointsInLeafNode,
double maxMBSortInHeap) |
Modifier and Type | Method and Description |
---|---|
void |
add(byte[] packedValue,
int docID) |
void |
close() |
long |
finish(IndexOutput out)
Writes the BKD tree to the provided
IndexOutput and returns the file offset where index was written. |
long |
getPointCount()
How many points have been added so far
|
long |
merge(IndexOutput out,
List<MergeState.DocMap> docMaps,
List<BKDReader> readers,
List<Integer> docIDBases)
More efficient bulk-add for incoming
BKDReader s. |
protected int |
split(byte[] minPackedValue,
byte[] maxPackedValue) |
static void |
verifyParams(int numDims,
int maxPointsInLeafNode,
double maxMBSortInHeap) |
protected void |
writeCommonPrefixes(IndexOutput out,
int[] commonPrefixes,
byte[] packedValue) |
protected void |
writeIndex(IndexOutput out,
long[] leafBlockFPs,
byte[] splitPackedValues)
Subclass can change how it writes the index.
|
protected void |
writeLeafBlockDocs(IndexOutput out,
int[] docIDs,
int start,
int count) |
protected void |
writeLeafBlockPackedValue(IndexOutput out,
int[] commonPrefixLengths,
byte[] bytes) |
public static final String CODEC_NAME
public static final int VERSION_START
public static final int VERSION_CURRENT
public static final int DEFAULT_MAX_POINTS_IN_LEAF_NODE
public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
public static final int MAX_DIMS
protected final int numDims
protected final int bytesPerDim
protected final int packedBytesLength
protected final FixedBitSet docsSeen
protected final int maxPointsInLeafNode
protected final byte[] minPackedValue
protected final byte[] maxPackedValue
protected long pointCount
public BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDims, int bytesPerDim) throws IOException
IOException
public BKDWriter(int maxDoc, Directory tempDir, String tempFileNamePrefix, int numDims, int bytesPerDim, int maxPointsInLeafNode, double maxMBSortInHeap) throws IOException
IOException
public static void verifyParams(int numDims, int maxPointsInLeafNode, double maxMBSortInHeap)
public void add(byte[] packedValue, int docID) throws IOException
IOException
public long getPointCount()
public long merge(IndexOutput out, List<MergeState.DocMap> docMaps, List<BKDReader> readers, List<Integer> docIDBases) throws IOException
BKDReader
s. This does a merge sort of the already
sorted values and currently only works when numDims==1. This returns -1 if all documents containing
dimensional values were deleted.IOException
public long finish(IndexOutput out) throws IOException
IndexOutput
and returns the file offset where index was written.IOException
protected void writeIndex(IndexOutput out, long[] leafBlockFPs, byte[] splitPackedValues) throws IOException
IOException
protected void writeLeafBlockDocs(IndexOutput out, int[] docIDs, int start, int count) throws IOException
IOException
protected void writeLeafBlockPackedValue(IndexOutput out, int[] commonPrefixLengths, byte[] bytes) throws IOException
IOException
protected void writeCommonPrefixes(IndexOutput out, int[] commonPrefixes, byte[] packedValue) throws IOException
IOException
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
IOException
protected int split(byte[] minPackedValue, byte[] maxPackedValue)
Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.