Package org.apache.lucene.util.encoding

Offers various encoders and decoders for integers, as well as the mechanisms to create new ones.


Class Summary
ChunksIntEncoder An IntEncoder which encodes values in chunks.
DGapIntDecoder An IntDecoder which wraps another IntDecoder and reverts the d-gap that was encoded by DGapIntEncoder.
DGapIntEncoder An IntEncoderFilter which encodes the gap between the given values, rather than the values themselves.
EightFlagsIntDecoder Decodes data which was encoded by EightFlagsIntEncoder.
EightFlagsIntEncoder A ChunksIntEncoder which encodes data in chunks of 8.
FourFlagsIntDecoder Decodes data which was encoded by FourFlagsIntEncoder.
FourFlagsIntEncoder A ChunksIntEncoder which encodes values in chunks of 4.
IntDecoder Decodes integers from a set InputStream.
IntEncoder Encodes integers to a set OutputStream.
IntEncoderFilter An abstract implementation of IntEncoder which is served as a filter on the values to encode.
NOnesIntDecoder Decodes data which was encoded by NOnesIntEncoder.
NOnesIntEncoder A variation of FourFlagsIntEncoder which translates the data as follows: Values ≥ 2 are trnalsated to value+1 (2 ⇒ 3, 3 ⇒ 4 and so forth).
SimpleIntDecoder A simple stream decoder which can decode values encoded with SimpleIntEncoder.
SimpleIntEncoder A simple IntEncoder, writing an integer as 4 raw bytes.
SortingIntEncoder An IntEncoderFilter which sorts the values to encode in ascending order before encoding them.
UniqueValuesIntEncoder An IntEncoderFilter which ensures only unique values are encoded.
VInt8IntDecoder An IntDecoder which can decode values encoded by VInt8IntEncoder.
VInt8IntEncoder An IntEncoder which implements variable length encoding.

Package org.apache.lucene.util.encoding Description

Offers various encoders and decoders for integers, as well as the mechanisms to create new ones. The super class for all encoders is IntEncoder and for most of the encoders there is a matching IntDecoder implementation (not all encoders need a decoder).

An encoder encodes the integers that are passed to encode into a set output stream (see reInit). One should always call close when all integers have been encoded, to ensure proper finish by the encoder. Some encoders buffer values in-memory and encode in batches in order to optimize the encoding, and not closing them may result in loss of information or corrupt stream.

A proper and typical usage of an encoder looks like this:

int[] data = <the values to encode>
IntEncoder encoder = new VInt8IntEncoder();
OutputStream out = new ByteArrayOutputStream();
for (int val : data) {

// Print the bytes in binary
byte[] bytes = out.toByteArray();
for (byte b : bytes) {
Each encoder also implements createMatchingDecoder which returns the matching decoder for this encoder. As mentioned above, not all encoders have a matching decoder (like some encoder filters which are explained next), however every encoder should return a decoder following a call to that method. To complete the example above, one can easily iterate over the decoded values like this:

IntDecoder d = e.createMatchingDecoder();
d.reInit(new ByteArrayInputStream(bytes));
long val;
while ((val = d.decode()) != IntDecoder.EOS) {

Some encoders don't perform any encoding at all, or do not include an encoding logic. Those are called IntEncoderFilters. A filter is an encoder which delegates the encoding task to a given encoder, however performs additional logic before the values are sent for encoding. An example is DGapIntEncoder which encodes the gaps between values rather than the values themselves. Another example is SortingIntEncoder which sorts all the values in ascending order before they are sent for encoding. This encoder aggregates the values in its encode implementation and decoding only happens upon calling close.

Extending IntEncoder

Extending IntEncoder is a very easy task. One only needs to implement encode and createMatchingDecoder as the base implementation takes care of re-initializing the output stream and closing it. The following example illustrates how can one write an encoder (and a matching decoder) which 'tags' the stream with type/ID of the encoder. Such tagging is important in scenarios where an application uses different encoders for different streams, and wants to manage some sort of mapping between an encoder ID to an IntEncoder/Decoder implementation, so a proper decoder will be initialized on the fly:

public class TaggingIntEncoder extends IntEncoderFilter {
  public TaggingIntEncoder(IntEncoder encoder) {
  public void encode(int value) throws IOException {

  public IntDecoder createMatchingDecoder() {
    return new TaggingIntDecoder();
  public void reInit(OutputStream out) {
    // Assumes the application has a static EncodersMap class which is able to 
    // return a unique ID for a given encoder.
    int encoderID = EncodersMap.getID(encoder);

  public String toString() {
    return "Tagging (" + encoder.toString() + ")";

And the matching decoder:

public class TaggingIntDecoder extends IntDecoder {
  // Will be initialized upon calling reInit.
  private IntDecoder decoder;
  public void reInit(InputStream in) {
    // Read the ID of the encoder that tagged this stream.
    int encoderID =;
    // Assumes EncodersMap can return the proper IntEncoder given the ID.
    decoder = EncodersMap.getEncoder(encoderID).createMatchingDecoder();
  public long decode() throws IOException {
    return decoder.decode();

  public String toString() {
    return "Tagging (" + decoder == null ? "none" : decoder.toString() + ")";

The example implements TaggingIntEncoder as a filter over another encoder. Even though it does not do any filtering on the actual values, it feels right to present it as a filter. Anyway, this is just an example code and one can choose to implement it however it makes sense to the application. For simplicity, error checking was omitted from the sample code.

Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.