|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.lucene.search.Query org.apache.lucene.search.MultiTermQuery org.apache.lucene.search.NumericRangeQuery<T>
public final class NumericRangeQuery<T extends Number>
A Query
that matches numeric values within a
specified range. To use this, you must first index the
numeric values using IntField
, FloatField
, LongField
or DoubleField
(expert: NumericTokenStream
). If your terms are instead textual,
you should use TermRangeQuery
. NumericRangeFilter
is the filter equivalent of this
query.
You create a new NumericRangeQuery with the static factory methods, eg:
Query q = NumericRangeQuery.newFloatRange("weight", 0.03f, 0.10f, true, true);matches all documents whose float valued "weight" field ranges from 0.03 to 0.10, inclusive.
The performance of NumericRangeQuery is much better
than the corresponding TermRangeQuery
because the
number of terms that must be searched is usually far
fewer, thanks to trie indexing, described below.
You can optionally specify a precisionStep
when creating this query. This is necessary if you've
changed this configuration from its default (4) during
indexing. Lower values consume more disk space but speed
up searching. Suitable values are between 1 and
8. A good starting point to test is 4,
which is the default value for all Numeric*
classes. See below for
details.
This query defaults to MultiTermQuery.CONSTANT_SCORE_AUTO_REWRITE_DEFAULT.
With precision steps of ≤4, this query can be run with
one of the BooleanQuery rewrite methods without changing
BooleanQuery's default max clause count.
See the publication about panFMP,
where this algorithm was described (referred to as TrieRangeQuery
):
Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023
A quote from this paper: Because Apache Lucene is a full-text
search engine and not a conventional database, it cannot handle numerical ranges
(e.g., field value is inside user defined bounds, even dates are numerical values).
We have developed an extension to Apache Lucene that stores
the numerical values in a special string-encoded format with variable precision
(all numerical values like doubles, longs, floats, and ints are converted to
lexicographic sortable string representations and stored with different precisions
(for a more detailed description of how the values are stored,
see NumericUtils
). A range is then divided recursively into multiple intervals for searching:
The center of the range is searched only with the lowest possible precision in the trie,
while the boundaries are matched more exactly. This reduces the number of terms dramatically.
For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that
uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the
lowest precision. Overall, a range could consist of a theoretical maximum of
7*255*2 + 255 = 3825
distinct terms (when there is a term for every distinct value of an
8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used
because it would always be possible to reduce the full 256 values to one term with degraded precision).
In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records
and a uniform value distribution).
You can choose any precisionStep
when encoding values.
Lower step values mean more precisions and so more terms in index (and index gets larger). The number
of indexed terms per value is (those are generated by NumericTokenStream
):
indexedTermsPerValue = ceil(bitsPerValue / precisionStep)
As the lower precision terms are shared by many values, the additional terms only slightly grow the term dictionary (approx. 7% forprecisionStep=4
), but have a larger
impact on the postings (the postings file will have more entries, as every document is linked to
indexedTermsPerValue
terms instead of one). The formula to estimate the growth
of the term dictionary in comparison to one term per value:
On the other hand, if the precisionStep
is smaller, the maximum number of terms to match reduces,
which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while
executing the query is:
For longs stored using a precision step of 4, maxQueryTerms = 15*15*2 + 15 = 465
, and for a precision
step of 2, maxQueryTerms = 31*3*2 + 3 = 189
. But the faster search speed is reduced by more seeking
in the term enum of the index. Because of this, the ideal precisionStep
value can only
be found out by testing. Important: You can index with a lower precision step value and test search speed
using a multiple of the original step value.
Good values for precisionStep
are depending on usage and data type:
precisionStep
is given.
Integer.MAX_VALUE
(see below).
TermRangeQuery
. But it can be used
to produce fields, that are solely used for sorting (in this case simply use Integer.MAX_VALUE
as
precisionStep
). Using IntField
,
LongField
, FloatField
or DoubleField
for sorting
is ideal, because building the field cache is much faster than with text-only numbers.
These fields have one term per value and therefore also work with term enumeration for building distinct lists
(e.g. facets / preselected values to search for).
Sorting is also possible with range query optimized fields using one of the above precisionSteps
.
Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed
that TermRangeQuery
in boolean rewrite mode (with raised BooleanQuery
clause count)
took about 30-40 secs to complete, TermRangeQuery
in constant score filter rewrite mode took 5 secs
and executing this class took <100ms to complete (on an Opteron64 machine, Java 1.5, 8 bit
precision step). This query type was developed for a geographic portal, where the performance for
e.g. bounding boxes or exact date/time stamps is important.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.lucene.search.MultiTermQuery |
---|
MultiTermQuery.ConstantScoreAutoRewrite, MultiTermQuery.RewriteMethod, MultiTermQuery.TopTermsBoostOnlyBooleanQueryRewrite, MultiTermQuery.TopTermsScoringBooleanQueryRewrite |
Field Summary |
---|
Fields inherited from class org.apache.lucene.search.MultiTermQuery |
---|
CONSTANT_SCORE_AUTO_REWRITE_DEFAULT, CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE, CONSTANT_SCORE_FILTER_REWRITE, field, rewriteMethod, SCORING_BOOLEAN_QUERY_REWRITE |
Method Summary | |
---|---|
boolean |
equals(Object o)
|
T |
getMax()
Returns the upper value of this range query |
T |
getMin()
Returns the lower value of this range query |
int |
getPrecisionStep()
Returns the precision step. |
protected TermsEnum |
getTermsEnum(Terms terms,
AttributeSource atts)
Construct the enumeration to be used, expanding the pattern term. |
int |
hashCode()
|
boolean |
includesMax()
Returns true if the upper endpoint is inclusive |
boolean |
includesMin()
Returns true if the lower endpoint is inclusive |
static NumericRangeQuery<Double> |
newDoubleRange(String field,
Double min,
Double max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a double
range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). |
static NumericRangeQuery<Double> |
newDoubleRange(String field,
int precisionStep,
Double min,
Double max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a double
range using the given precisionStep . |
static NumericRangeQuery<Float> |
newFloatRange(String field,
Float min,
Float max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a float
range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). |
static NumericRangeQuery<Float> |
newFloatRange(String field,
int precisionStep,
Float min,
Float max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a float
range using the given precisionStep . |
static NumericRangeQuery<Integer> |
newIntRange(String field,
Integer min,
Integer max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a int
range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). |
static NumericRangeQuery<Integer> |
newIntRange(String field,
int precisionStep,
Integer min,
Integer max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a int
range using the given precisionStep . |
static NumericRangeQuery<Long> |
newLongRange(String field,
int precisionStep,
Long min,
Long max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a long
range using the given precisionStep . |
static NumericRangeQuery<Long> |
newLongRange(String field,
Long min,
Long max,
boolean minInclusive,
boolean maxInclusive)
Factory that creates a NumericRangeQuery , that queries a long
range using the default precisionStep NumericUtils.PRECISION_STEP_DEFAULT (4). |
String |
toString(String field)
Prints a query to a string, with field assumed to be the
default field and omitted. |
Methods inherited from class org.apache.lucene.search.MultiTermQuery |
---|
getField, getRewriteMethod, getTermsEnum, rewrite, setRewriteMethod |
Methods inherited from class org.apache.lucene.search.Query |
---|
clone, createWeight, extractTerms, getBoost, setBoost, toString |
Methods inherited from class java.lang.Object |
---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
Method Detail |
---|
public static NumericRangeQuery<Long> newLongRange(String field, int precisionStep, Long min, Long max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a long
range using the given precisionStep
.
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
public static NumericRangeQuery<Long> newLongRange(String field, Long min, Long max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a long
range using the default precisionStep
NumericUtils.PRECISION_STEP_DEFAULT
(4).
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
public static NumericRangeQuery<Integer> newIntRange(String field, int precisionStep, Integer min, Integer max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a int
range using the given precisionStep
.
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
public static NumericRangeQuery<Integer> newIntRange(String field, Integer min, Integer max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a int
range using the default precisionStep
NumericUtils.PRECISION_STEP_DEFAULT
(4).
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
public static NumericRangeQuery<Double> newDoubleRange(String field, int precisionStep, Double min, Double max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a double
range using the given precisionStep
.
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
.
Double.NaN
will never match a half-open range, to hit NaN
use a query
with min == max == Double.NaN
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
public static NumericRangeQuery<Double> newDoubleRange(String field, Double min, Double max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a double
range using the default precisionStep
NumericUtils.PRECISION_STEP_DEFAULT
(4).
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
.
Double.NaN
will never match a half-open range, to hit NaN
use a query
with min == max == Double.NaN
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
public static NumericRangeQuery<Float> newFloatRange(String field, int precisionStep, Float min, Float max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a float
range using the given precisionStep
.
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
.
Float.NaN
will never match a half-open range, to hit NaN
use a query
with min == max == Float.NaN
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
public static NumericRangeQuery<Float> newFloatRange(String field, Float min, Float max, boolean minInclusive, boolean maxInclusive)
NumericRangeQuery
, that queries a float
range using the default precisionStep
NumericUtils.PRECISION_STEP_DEFAULT
(4).
You can have half-open ranges (which are in fact </≤ or >/≥ queries)
by setting the min or max value to null
.
Float.NaN
will never match a half-open range, to hit NaN
use a query
with min == max == Float.NaN
. By setting inclusive to false, it will
match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
protected TermsEnum getTermsEnum(Terms terms, AttributeSource atts) throws IOException
MultiTermQuery
TermsEnum.EMPTY
if no
terms match). The TermsEnum must already be
positioned to the first matching term.
The given AttributeSource
is passed by the MultiTermQuery.RewriteMethod
to
provide attributes, the rewrite method uses to inform about e.g. maximum competitive boosts.
This is currently only used by TopTermsRewrite
getTermsEnum
in class MultiTermQuery
IOException
public boolean includesMin()
true
if the lower endpoint is inclusive
public boolean includesMax()
true
if the upper endpoint is inclusive
public T getMin()
public T getMax()
public int getPrecisionStep()
public String toString(String field)
Query
field
assumed to be the
default field and omitted.
toString
in class Query
public final boolean equals(Object o)
equals
in class MultiTermQuery
public final int hashCode()
hashCode
in class MultiTermQuery
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |