public class RangeTreeDocValuesFormat extends DocValuesFormat
DocValuesFormat
to efficiently index numeric values from
from SortedNumericDocValuesField
or BytesRef values from SortedSetDocValuesField
for numeric range queries using (NumericRangeTreeQuery
) and arbitrary binary
range queries using SortedSetRangeTreeQuery
.
This wraps Lucene54DocValuesFormat
, but saves its own numeric tree
structures to disk for fast query-time intersection. See this paper
for details.
The numeric tree slices up 1D space into smaller and smaller ranges, until the smallest ranges have approximately between X/2 and X (X default is 1024) values in them, at which point such leaf cells are written as a block to disk, while the index tree structure records how space was sub-divided is loaded into HEAP at search time. At search time, the tree is recursed based on whether each of left or right child overlap with the query range, and once a leaf block is reached, all documents in that leaf block are collected if the cell is fully enclosed by the query shape, or filtered and then collected, if not.
The index is also quite compact, because docs only appear once in the tree (no "prefix terms").
In addition to the files written by Lucene54DocValuesFormat
, this format writes:
The disk format is experimental and free to change suddenly, and this code likely has new and exciting bugs!
Constructor and Description |
---|
RangeTreeDocValuesFormat()
Default constructor
|
RangeTreeDocValuesFormat(int maxPointsInLeafNode,
int maxPointsSortInHeap)
Creates this with custom configuration.
|
Modifier and Type | Method and Description |
---|---|
DocValuesConsumer |
fieldsConsumer(SegmentWriteState state) |
DocValuesProducer |
fieldsProducer(SegmentReadState state) |
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString
public RangeTreeDocValuesFormat()
public RangeTreeDocValuesFormat(int maxPointsInLeafNode, int maxPointsSortInHeap)
maxPointsInLeafNode
- Maximum number of points in each leaf cell. Smaller values create a deeper tree with larger in-heap index and possibly
faster searching. The default is 1024.maxPointsSortInHeap
- Maximum number of points where in-heap sort can be used. When the number of points exceeds this, a (slower)
offline sort is used. The default is 128 * 1024.public DocValuesConsumer fieldsConsumer(SegmentWriteState state) throws IOException
fieldsConsumer
in class DocValuesFormat
IOException
public DocValuesProducer fieldsProducer(SegmentReadState state) throws IOException
fieldsProducer
in class DocValuesFormat
IOException
Copyright © 2000-2016 Apache Software Foundation. All Rights Reserved.