org.apache.lucene.util.packed
public class EliasFanoEncoder extends Object
The Elias-Fano encoding is a high bits / low bits representation of
a monotonically increasing sequence of numValues > 0
natural numbers x[i]
0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound
where upperBound > 0
is an upper bound on the last value.
The Elias-Fano encoding uses less than half a bit per encoded number more
than the smallest representation
that can encode any monotone sequence with the same bounds.
The lower L
bits of each x[i]
are stored explicitly and contiguously
in the lower-bits array, with L
chosen as (log()
base 2):
L = max(0, floor(log(upperBound/numValues)))
The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0
):
(x[i]/2**L) - (x[i-1]/2**L)
The unary code encodes a natural number n
by n
0 bits followed by a 1 bit:
0...01
.
In the upper bits the total the number of 1 bits is numValues
and the total number of 0 bits is:
floor(x[numValues-1]/2**L) <= upperBound/(2**max(0, floor(log(upperBound/numValues)))) <= 2*numValues
The Elias-Fano encoding uses at most
2 + ceil(log(upperBound/numValues))
bits per encoded number. With upperBound
in these bounds (p
is an integer):
2**p < x[numValues-1] <= upperBound <= 2**(p+1)
the number of bits per encoded number is minimized.
In this implementation the values in the sequence can be given as long
,
numValues = 0
and upperBound = 0
are allowed,
and each of the upper and lower bit arrays should fit in a long[]
.
This implementation is based on this article:
Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3 and 4.
Retrieved from http://arxiv.org/pdf/1206.4300 .
The articles originally describing the Elias-Fano representation are:
Peter Elias, "Efficient storage and retrieval by content and address of static files",
J. Assoc. Comput. Mach., 21(2):246–260, 1974.
Robert M. Fano, "On the number of bits required to implement an associative memory",
Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.
Constructor and Description |
---|
EliasFanoEncoder(long numValues,
long upperBound)
Construct an Elias-Fano encoder.
|
Modifier and Type | Method and Description |
---|---|
void |
encodeNext(long x)
Call at most
numValues times to encode a non decreasing sequence of non negative numbers. |
boolean |
equals(Object other) |
EliasFanoDecoder |
getDecoder()
Returns an
EliasFanoDecoder to access the encoded values. |
long[] |
getLowerBits()
Expert.
|
long[] |
getUpperBits()
Expert.
|
int |
hashCode() |
static String |
longHex(long x) |
static boolean |
sufficientlySmallerThanBitSet(long numValues,
long upperBound)
Provide an indication that is better to use an
EliasFanoEncoder than a FixedBitSet
to encode document identifiers. |
String |
toString() |
public EliasFanoEncoder(long numValues, long upperBound)
encodeNext(long)
numValues
times to encode
a non decreasing sequence of non negative numbers.numValues
- The number of values that is to be encoded.upperBound
- At least the highest value that will be encoded.
For space efficiency this should not exceed the power of two that equals
or is the first higher than the actual maximum.
numValues >= (upperBound/3)
a FixedBitSet
will take less space.IllegalArgumentException
- when:
numValues
is negative, or
numValues
is non negative and upperBound
is negative, or
long[]
:
(L * numValues / 64) > Integer.MAX_VALUE
, or
long[]
:
(2 * numValues / 64) > Integer.MAX_VALUE
.
public void encodeNext(long x)
numValues
times to encode a non decreasing sequence of non negative numbers.x
- The next number to be encoded.IllegalArgumentException
- when:
numValues
times, or
x
is smaller than an earlier encoded value, or
x
is larger than upperBound
.
public static boolean sufficientlySmallerThanBitSet(long numValues, long upperBound)
EliasFanoEncoder
than a FixedBitSet
to encode document identifiers.
This indication is not precise and may change in the future.
numValues
- The number of document identifiers that is to be encoded. Should be non negative.upperBound
- The maximum possible value for a document identifier. Should be at least numValues.public EliasFanoDecoder getDecoder()
EliasFanoDecoder
to access the encoded values.
Perform all calls to encodeNext(long)
before calling getDecoder()
.public long[] getLowerBits()
public long[] getUpperBits()
public static String longHex(long x)
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.