Package org.apache.lucene.util.encoding

Offers various encoders and decoders for integers, as well as the mechanisms to create new ones.

See: Description

Package org.apache.lucene.util.encoding Description

Offers various encoders and decoders for integers, as well as the mechanisms to create new ones. The super class for all encoders is IntEncoder and for most of the encoders there is a matching IntDecoder implementation (not all encoders need a decoder).

An encoder encodes the integers that are passed to encode into a set output stream (see reInit). One should always call close when all integers have been encoded, to ensure proper finish by the encoder. Some encoders buffer values in-memory and encode in batches in order to optimize the encoding, and not closing them may result in loss of information or corrupt stream.

A proper and typical usage of an encoder looks like this:


int[] data = <the values to encode>
IntEncoder encoder = new VInt8IntEncoder();
OutputStream out = new ByteArrayOutputStream();
encoder.reInit(out);
for (int val : data) {
  encoder.encode(val);
}
encoder.close();

// Print the bytes in binary
byte[] bytes = out.toByteArray();
for (byte b : bytes) {
  System.out.println(Integer.toBinaryString(b));
}
Each encoder also implements createMatchingDecoder which returns the matching decoder for this encoder. As mentioned above, not all encoders have a matching decoder (like some encoder filters which are explained next), however every encoder should return a decoder following a call to that method. To complete the example above, one can easily iterate over the decoded values like this:

IntDecoder d = e.createMatchingDecoder();
d.reInit(new ByteArrayInputStream(bytes));
long val;
while ((val = d.decode()) != IntDecoder.EOS) {
  System.out.println(val);
}

Some encoders don't perform any encoding at all, or do not include an encoding logic. Those are called IntEncoderFilters. A filter is an encoder which delegates the encoding task to a given encoder, however performs additional logic before the values are sent for encoding. An example is DGapIntEncoder which encodes the gaps between values rather than the values themselves. Another example is SortingIntEncoder which sorts all the values in ascending order before they are sent for encoding. This encoder aggregates the values in its encode implementation and decoding only happens upon calling close.

Extending IntEncoder

Extending IntEncoder is a very easy task. One only needs to implement encode and createMatchingDecoder as the base implementation takes care of re-initializing the output stream and closing it. The following example illustrates how can one write an encoder (and a matching decoder) which 'tags' the stream with type/ID of the encoder. Such tagging is important in scenarios where an application uses different encoders for different streams, and wants to manage some sort of mapping between an encoder ID to an IntEncoder/Decoder implementation, so a proper decoder will be initialized on the fly:

public class TaggingIntEncoder extends IntEncoderFilter {
  
  public TaggingIntEncoder(IntEncoder encoder) {
    super(encoder);
  }
  
  @Override
  public void encode(int value) throws IOException {
    encoder.encode(value);
  }

  @Override
  public IntDecoder createMatchingDecoder() {
    return new TaggingIntDecoder();
  }
        
  @Override
  public void reInit(OutputStream out) {
    super.reInit(os);
    // Assumes the application has a static EncodersMap class which is able to 
    // return a unique ID for a given encoder.
    int encoderID = EncodersMap.getID(encoder);
    this.out.write(encoderID);
  }

  @Override
  public String toString() {
    return "Tagging (" + encoder.toString() + ")";
  }

}
And the matching decoder:

public class TaggingIntDecoder extends IntDecoder {
  
  // Will be initialized upon calling reInit.
  private IntDecoder decoder;
  
  @Override
  public void reInit(InputStream in) {
    super.reInit(in);
    
    // Read the ID of the encoder that tagged this stream.
    int encoderID = in.read();
    
    // Assumes EncodersMap can return the proper IntEncoder given the ID.
    decoder = EncodersMap.getEncoder(encoderID).createMatchingDecoder();
  }
        
  @Override
  public long decode() throws IOException {
    return decoder.decode();
  }

  @Override
  public String toString() {
    return "Tagging (" + decoder == null ? "none" : decoder.toString() + ")";
  }

}
The example implements TaggingIntEncoder as a filter over another encoder. Even though it does not do any filtering on the actual values, it feels right to present it as a filter. Anyway, this is just an example code and one can choose to implement it however it makes sense to the application. For simplicity, error checking was omitted from the sample code.