org.apache.lucene.util.packed
Class EliasFanoEncoder

java.lang.Object
  extended by org.apache.lucene.util.packed.EliasFanoEncoder

public class EliasFanoEncoder
extends Object

Encode a non decreasing sequence of non negative whole numbers in the Elias-Fano encoding that was introduced in the 1970's by Peter Elias and Robert Fano.

The Elias-Fano encoding is a high bits / low bits representation of a monotonically increasing sequence of numValues > 0 natural numbers x[i]

0 <= x[0] <= x[1] <= ... <= x[numValues-2] <= x[numValues-1] <= upperBound

where upperBound > 0 is an upper bound on the last value.
The Elias-Fano encoding uses less than half a bit per encoded number more than the smallest representation that can encode any monotone sequence with the same bounds.

The lower L bits of each x[i] are stored explicitly and contiguously in the lower-bits array, with L chosen as (log() base 2):

L = max(0, floor(log(upperBound/numValues)))

The upper bits are stored in the upper-bits array as a sequence of unary-coded gaps (x[-1] = 0):

(x[i]/2**L) - (x[i-1]/2**L)

The unary code encodes a natural number n by n 0 bits followed by a 1 bit: 0...01.
In the upper bits the total the number of 1 bits is numValues and the total number of 0 bits is:

floor(x[numValues-1]/2**L) <= upperBound/(2**max(0, floor(log(upperBound/numValues)))) <= 2*numValues

The Elias-Fano encoding uses at most

2 + ceil(log(upperBound/numValues))

bits per encoded number. With upperBound in these bounds (p is an integer):

2**p < x[numValues-1] <= upperBound <= 2**(p+1)

the number of bits per encoded number is minimized.

In this implementation the values in the sequence can be given as long, numValues = 0 and upperBound = 0 are allowed, and each of the upper and lower bit arrays should fit in a long[].

This implementation is based on this article:
Sebastiano Vigna, "Quasi Succinct Indices", June 19, 2012, sections 3 and 4. Retrieved from http://arxiv.org/pdf/1206.4300 .

The articles originally describing the Elias-Fano representation are:
Peter Elias, "Efficient storage and retrieval by content and address of static files", J. Assoc. Comput. Mach., 21(2):246–260, 1974.
Robert M. Fano, "On the number of bits required to implement an associative memory", Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., 1971.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Constructor Summary
EliasFanoEncoder(long numValues, long upperBound)
          Construct an Elias-Fano encoder.
 
Method Summary
 void encodeNext(long x)
          Call at most numValues times to encode a non decreasing sequence of non negative numbers.
 boolean equals(Object other)
           
 EliasFanoDecoder getDecoder()
          Returns an EliasFanoDecoder to access the encoded values.
 long[] getLowerBits()
          Expert.
 long[] getUpperBits()
          Expert.
 int hashCode()
           
static String longHex(long x)
           
static boolean sufficientlySmallerThanBitSet(long numValues, long upperBound)
          Provide an indication that is better to use an EliasFanoEncoder than a FixedBitSet to encode document identifiers.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

EliasFanoEncoder

public EliasFanoEncoder(long numValues,
                        long upperBound)
Construct an Elias-Fano encoder. After construction, call encodeNext(long) numValues times to encode a non decreasing sequence of non negative numbers.

Parameters:
numValues - The number of values that is to be encoded.
upperBound - At least the highest value that will be encoded. For space efficiency this should not exceed the power of two that equals or is the first higher than the actual maximum.
When numValues >= (upperBound/3) a FixedBitSet will take less space.
Throws:
IllegalArgumentException - when:
  • numValues is negative, or
  • numValues is non negative and upperBound is negative, or
  • the low bits do not fit in a long[]: (L * numValues / 64) > Integer.MAX_VALUE, or
  • the high bits do not fit in a long[]: (2 * numValues / 64) > Integer.MAX_VALUE.
Method Detail

encodeNext

public void encodeNext(long x)
Call at most numValues times to encode a non decreasing sequence of non negative numbers.

Parameters:
x - The next number to be encoded.
Throws:
IllegalArgumentException - when:
  • called more than numValues times, or
  • x is smaller than an earlier encoded value, or
  • x is larger than upperBound.

sufficientlySmallerThanBitSet

public static boolean sufficientlySmallerThanBitSet(long numValues,
                                                    long upperBound)
Provide an indication that is better to use an EliasFanoEncoder than a FixedBitSet to encode document identifiers. This indication is not precise and may change in the future.
An EliasFanoEncoder is favoured when the size of the encoding by the EliasFanoEncoder is at most 5/6 of the size of the FixedBitSet.
This condition is the same as comparing estimates of the number of bits accessed by a pair of FixedBitSets and by a pair of non indexed EliasFanoDocIdSets when determining the intersections of the pairs.

Parameters:
numValues - The number of document identifiers that is to be encoded. Should be non negative.
upperBound - The maximum possible value for a document identifier. Should be at least numValues.

getDecoder

public EliasFanoDecoder getDecoder()
Returns an EliasFanoDecoder to access the encoded values. Perform all calls to encodeNext(long) before calling getDecoder().


getLowerBits

public long[] getLowerBits()
Expert. The low bits.


getUpperBits

public long[] getUpperBits()
Expert. The high bits.


toString

public String toString()
Overrides:
toString in class Object

equals

public boolean equals(Object other)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

longHex

public static String longHex(long x)


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.