public final class NumericUtils extends Object
To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically.
This class generates terms to achieve this: First the numerical integer values need to
be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned
and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is
sortable like the original integer value (even using UTF-8 sort order). Each value is also
prefixed (in the first char) by the shift
value (number of bits removed) used
during encoding.
To also index floating point numbers, this class supplies two methods to convert them
to integer values by changing their bit layout: doubleToSortableLong(double)
,
floatToSortableInt(float)
. You will have no precision loss by
converting floating point numbers to integers and back (only that the integer form
is not usable). Other data types like dates can easily converted to longs or ints (e.g.
date to long: Date.getTime()
).
For easy usage, the trie algorithm is implemented for indexing inside
NumericTokenStream
that can index int
, long
,
float
, and double
. For querying,
NumericRangeQuery
and NumericRangeFilter
implement the query part
for the same data types.
This class can also be used, to generate lexicographically sortable (according to
BytesRef.getUTF8SortedAsUTF16Comparator()
) representations of numeric data
types for other usages (e.g. sorting).
Modifier and Type | Class and Description |
---|---|
static class |
NumericUtils.IntRangeBuilder
|
static class |
NumericUtils.LongRangeBuilder
|
Modifier and Type | Field and Description |
---|---|
static int |
BUF_SIZE_INT
The maximum term length (used for
byte[] buffer size)
for encoding int values. |
static int |
BUF_SIZE_LONG
The maximum term length (used for
byte[] buffer size)
for encoding long values. |
static int |
PRECISION_STEP_DEFAULT
The default precision step used by
LongField ,
DoubleField , NumericTokenStream , NumericRangeQuery , and NumericRangeFilter . |
static int |
PRECISION_STEP_DEFAULT_32
The default precision step used by
IntField and
FloatField . |
static byte |
SHIFT_START_INT
Integers are stored at lower precision by shifting off lower bits.
|
static byte |
SHIFT_START_LONG
Longs are stored at lower precision by shifting off lower bits.
|
Modifier and Type | Method and Description |
---|---|
static long |
doubleToSortableLong(double val)
Converts a
double value to a sortable signed long . |
static TermsEnum |
filterPrefixCodedInts(TermsEnum termsEnum)
Filters the given
TermsEnum by accepting only prefix coded 32 bit
terms with a shift value of 0. |
static TermsEnum |
filterPrefixCodedLongs(TermsEnum termsEnum)
Filters the given
TermsEnum by accepting only prefix coded 64 bit
terms with a shift value of 0. |
static int |
floatToSortableInt(float val)
Converts a
float value to a sortable signed int . |
static int |
getMaxInt(Terms terms)
Returns the maximum int value indexed into this
numeric field.
|
static long |
getMaxLong(Terms terms)
Returns the maximum long value indexed into this
numeric field.
|
static int |
getMinInt(Terms terms)
Returns the minimum int value indexed into this
numeric field.
|
static long |
getMinLong(Terms terms)
Returns the minimum long value indexed into this
numeric field.
|
static int |
getPrefixCodedIntShift(BytesRef val)
Returns the shift value from a prefix encoded
int . |
static int |
getPrefixCodedLongShift(BytesRef val)
Returns the shift value from a prefix encoded
long . |
static void |
intToPrefixCoded(int val,
int shift,
BytesRef bytes)
Returns prefix coded bits after reducing the precision by
shift bits. |
static void |
intToPrefixCodedBytes(int val,
int shift,
BytesRef bytes)
Returns prefix coded bits after reducing the precision by
shift bits. |
static void |
longToPrefixCoded(long val,
int shift,
BytesRef bytes)
Returns prefix coded bits after reducing the precision by
shift bits. |
static void |
longToPrefixCodedBytes(long val,
int shift,
BytesRef bytes)
Returns prefix coded bits after reducing the precision by
shift bits. |
static int |
prefixCodedToInt(BytesRef val)
Returns an int from prefixCoded bytes.
|
static long |
prefixCodedToLong(BytesRef val)
Returns a long from prefixCoded bytes.
|
static long |
sortableDoubleBits(long bits)
Converts IEEE 754 representation of a double to sortable order (or back to the original)
|
static int |
sortableFloatBits(int bits)
Converts IEEE 754 representation of a float to sortable order (or back to the original)
|
static float |
sortableIntToFloat(int val)
Converts a sortable
int back to a float . |
static double |
sortableLongToDouble(long val)
Converts a sortable
long back to a double . |
static void |
splitIntRange(NumericUtils.IntRangeBuilder builder,
int precisionStep,
int minBound,
int maxBound)
Splits an int range recursively.
|
static void |
splitLongRange(NumericUtils.LongRangeBuilder builder,
int precisionStep,
long minBound,
long maxBound)
Splits a long range recursively.
|
public static final int PRECISION_STEP_DEFAULT
LongField
,
DoubleField
, NumericTokenStream
, NumericRangeQuery
, and NumericRangeFilter
.public static final int PRECISION_STEP_DEFAULT_32
IntField
and
FloatField
.public static final byte SHIFT_START_LONG
SHIFT_START_LONG+shift
in the first bytepublic static final int BUF_SIZE_LONG
byte[]
buffer size)
for encoding long
values.public static final byte SHIFT_START_INT
SHIFT_START_INT+shift
in the first bytepublic static final int BUF_SIZE_INT
byte[]
buffer size)
for encoding int
values.public static void longToPrefixCoded(long val, int shift, BytesRef bytes)
shift
bits.
This is method is used by NumericTokenStream
.
After encoding, bytes.offset
will always be 0.val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded valuepublic static void intToPrefixCoded(int val, int shift, BytesRef bytes)
shift
bits.
This is method is used by NumericTokenStream
.
After encoding, bytes.offset
will always be 0.val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded valuepublic static void longToPrefixCodedBytes(long val, int shift, BytesRef bytes)
shift
bits.
This is method is used by NumericTokenStream
.
After encoding, bytes.offset
will always be 0.val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded valuepublic static void intToPrefixCodedBytes(int val, int shift, BytesRef bytes)
shift
bits.
This is method is used by NumericTokenStream
.
After encoding, bytes.offset
will always be 0.val
- the numeric valueshift
- how many bits to strip from the rightbytes
- will contain the encoded valuepublic static int getPrefixCodedLongShift(BytesRef val)
long
.NumberFormatException
- if the supplied BytesRef
is
not correctly prefix encoded.public static int getPrefixCodedIntShift(BytesRef val)
int
.NumberFormatException
- if the supplied BytesRef
is
not correctly prefix encoded.public static long prefixCodedToLong(BytesRef val)
NumberFormatException
- if the supplied BytesRef
is
not correctly prefix encoded.longToPrefixCodedBytes(long, int, org.apache.lucene.util.BytesRef)
public static int prefixCodedToInt(BytesRef val)
NumberFormatException
- if the supplied BytesRef
is
not correctly prefix encoded.intToPrefixCodedBytes(int, int, org.apache.lucene.util.BytesRef)
public static long doubleToSortableLong(double val)
double
value to a sortable signed long
.
The value is converted by getting their IEEE 754 floating-point "double format"
bit layout and then some bits are swapped, to be able to compare the result as long.
By this the precision is not reduced, but the value can easily used as a long.
The sort order (including Double.NaN
) is defined by
Double.compareTo(java.lang.Double)
; NaN
is greater than positive infinity.sortableLongToDouble(long)
public static double sortableLongToDouble(long val)
long
back to a double
.doubleToSortableLong(double)
public static int floatToSortableInt(float val)
float
value to a sortable signed int
.
The value is converted by getting their IEEE 754 floating-point "float format"
bit layout and then some bits are swapped, to be able to compare the result as int.
By this the precision is not reduced, but the value can easily used as an int.
The sort order (including Float.NaN
) is defined by
Float.compareTo(java.lang.Float)
; NaN
is greater than positive infinity.sortableIntToFloat(int)
public static float sortableIntToFloat(int val)
int
back to a float
.floatToSortableInt(float)
public static long sortableDoubleBits(long bits)
public static int sortableFloatBits(int bits)
public static void splitLongRange(NumericUtils.LongRangeBuilder builder, int precisionStep, long minBound, long maxBound)
BooleanQuery
for each call to its
NumericUtils.LongRangeBuilder.addRange(BytesRef,BytesRef)
method.
This method is used by NumericRangeQuery
.
public static void splitIntRange(NumericUtils.IntRangeBuilder builder, int precisionStep, int minBound, int maxBound)
BooleanQuery
for each call to its
NumericUtils.IntRangeBuilder.addRange(BytesRef,BytesRef)
method.
This method is used by NumericRangeQuery
.
public static TermsEnum filterPrefixCodedLongs(TermsEnum termsEnum)
TermsEnum
by accepting only prefix coded 64 bit
terms with a shift value of 0.termsEnum
- the terms enum to filterTermsEnum
that only returns prefix coded 64 bit
terms with a shift value of 0.public static TermsEnum filterPrefixCodedInts(TermsEnum termsEnum)
TermsEnum
by accepting only prefix coded 32 bit
terms with a shift value of 0.termsEnum
- the terms enum to filterTermsEnum
that only returns prefix coded 32 bit
terms with a shift value of 0.public static int getMinInt(Terms terms) throws IOException
IOException
public static int getMaxInt(Terms terms) throws IOException
IOException
public static long getMinLong(Terms terms) throws IOException
IOException
public static long getMaxLong(Terms terms) throws IOException
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.