org.apache.lucene.util.packed
Class PackedInts

java.lang.Object
  extended by org.apache.lucene.util.packed.PackedInts

public class PackedInts
extends Object

Simplistic compression for array of unsigned long values. Each value is >= 0 and <= a specified maximum value. The values are stored as packed ints, with each value consuming a fixed number of bits.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Nested Class Summary
static interface PackedInts.Decoder
          A decoder for packed integers.
static interface PackedInts.Encoder
          An encoder for packed integers.
static class PackedInts.Format
          A format to write packed ints.
static class PackedInts.FormatAndBits
          Simple class that holds a format and a number of bits per value.
static class PackedInts.Header
          Header identifying the structure of a packed integer array.
static interface PackedInts.Mutable
          A packed integer array that can be modified.
static class PackedInts.NullReader
          A PackedInts.Reader which has all its values equal to 0 (bitsPerValue = 0).
static interface PackedInts.Reader
          A read-only random access array of positive integers.
static interface PackedInts.ReaderIterator
          Run-once iterator interface, to decode previously saved PackedInts.
static class PackedInts.Writer
          A write-once Writer.
 
Field Summary
static String CODEC_NAME
           
static float COMPACT
          No memory overhead at all, but the returned implementation may be slow.
static float DEFAULT
          At most 20% memory overhead.
static int DEFAULT_BUFFER_SIZE
          Default amount of memory to use for bulk operations.
static float FAST
          At most 50% memory overhead, always select a reasonably fast implementation.
static float FASTEST
          At most 700% memory overhead, always select a direct implementation.
static int VERSION_BYTE_ALIGNED
           
static int VERSION_CURRENT
           
static int VERSION_START
           
 
Constructor Summary
PackedInts()
           
 
Method Summary
static int bitsRequired(long maxValue)
          Returns how many bits are required to hold values up to and including maxValue
static void checkVersion(int version)
          Check the validity of a version number.
static void copy(PackedInts.Reader src, int srcPos, PackedInts.Mutable dest, int destPos, int len, int mem)
          Copy src[srcPos:srcPos+len] into dest[destPos:destPos+len] using at most mem bytes.
static PackedInts.FormatAndBits fastestFormatAndBits(int valueCount, int bitsPerValue, float acceptableOverheadRatio)
          Try to find the PackedInts.Format and number of bits per value that would restore from disk the fastest reader whose overhead is less than acceptableOverheadRatio.
static PackedInts.Decoder getDecoder(PackedInts.Format format, int version, int bitsPerValue)
          Get a PackedInts.Decoder.
static PackedInts.Reader getDirectReader(IndexInput in)
          Construct a direct PackedInts.Reader from an IndexInput.
static PackedInts.Reader getDirectReaderNoHeader(IndexInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue)
          Expert: Construct a direct PackedInts.Reader from a stream without reading metadata at the beginning of the stream.
static PackedInts.Reader getDirectReaderNoHeader(IndexInput in, PackedInts.Header header)
          Expert: Construct a direct PackedInts.Reader from an IndexInput without reading metadata at the beginning of the stream.
static PackedInts.Encoder getEncoder(PackedInts.Format format, int version, int bitsPerValue)
          Get an PackedInts.Encoder.
static PackedInts.Mutable getMutable(int valueCount, int bitsPerValue, float acceptableOverheadRatio)
          Create a packed integer array with the given amount of values initialized to 0.
static PackedInts.Reader getReader(DataInput in)
          Restore a PackedInts.Reader from a stream.
static PackedInts.ReaderIterator getReaderIterator(DataInput in, int mem)
          Retrieve PackedInts as a PackedInts.ReaderIterator
static PackedInts.ReaderIterator getReaderIteratorNoHeader(DataInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue, int mem)
          Expert: Restore a PackedInts.ReaderIterator from a stream without reading metadata at the beginning of the stream.
static PackedInts.Reader getReaderNoHeader(DataInput in, PackedInts.Format format, int version, int valueCount, int bitsPerValue)
          Expert: Restore a PackedInts.Reader from a stream without reading metadata at the beginning of the stream.
static PackedInts.Reader getReaderNoHeader(DataInput in, PackedInts.Header header)
          Expert: Restore a PackedInts.Reader from a stream without reading metadata at the beginning of the stream.
static PackedInts.Writer getWriter(DataOutput out, int valueCount, int bitsPerValue, float acceptableOverheadRatio)
          Create a packed integer array writer for the given output, format, value count, and number of bits per value.
static PackedInts.Writer getWriterNoHeader(DataOutput out, PackedInts.Format format, int valueCount, int bitsPerValue, int mem)
          Expert: Create a packed integer array writer for the given output, format, value count, and number of bits per value.
static long maxValue(int bitsPerValue)
          Calculates the maximum unsigned long that can be expressed with the given number of bits.
static PackedInts.Header readHeader(DataInput in)
          Expert: reads only the metadata from a stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

FASTEST

public static final float FASTEST
At most 700% memory overhead, always select a direct implementation.

See Also:
Constant Field Values

FAST

public static final float FAST
At most 50% memory overhead, always select a reasonably fast implementation.

See Also:
Constant Field Values

DEFAULT

public static final float DEFAULT
At most 20% memory overhead.

See Also:
Constant Field Values

COMPACT

public static final float COMPACT
No memory overhead at all, but the returned implementation may be slow.

See Also:
Constant Field Values

DEFAULT_BUFFER_SIZE

public static final int DEFAULT_BUFFER_SIZE
Default amount of memory to use for bulk operations.

See Also:
Constant Field Values

CODEC_NAME

public static final String CODEC_NAME
See Also:
Constant Field Values

VERSION_START

public static final int VERSION_START
See Also:
Constant Field Values

VERSION_BYTE_ALIGNED

public static final int VERSION_BYTE_ALIGNED
See Also:
Constant Field Values

VERSION_CURRENT

public static final int VERSION_CURRENT
See Also:
Constant Field Values
Constructor Detail

PackedInts

public PackedInts()
Method Detail

checkVersion

public static void checkVersion(int version)
Check the validity of a version number.


fastestFormatAndBits

public static PackedInts.FormatAndBits fastestFormatAndBits(int valueCount,
                                                            int bitsPerValue,
                                                            float acceptableOverheadRatio)
Try to find the PackedInts.Format and number of bits per value that would restore from disk the fastest reader whose overhead is less than acceptableOverheadRatio.

The acceptableOverheadRatio parameter makes sense for random-access PackedInts.Readers. In case you only plan to perform sequential access on this stream later on, you should probably use COMPACT.

If you don't know how many values you are going to write, use valueCount = -1.


getDecoder

public static PackedInts.Decoder getDecoder(PackedInts.Format format,
                                            int version,
                                            int bitsPerValue)
Get a PackedInts.Decoder.

Parameters:
format - the format used to store packed ints
version - the compatibility version
bitsPerValue - the number of bits per value
Returns:
a decoder

getEncoder

public static PackedInts.Encoder getEncoder(PackedInts.Format format,
                                            int version,
                                            int bitsPerValue)
Get an PackedInts.Encoder.

Parameters:
format - the format used to store packed ints
version - the compatibility version
bitsPerValue - the number of bits per value
Returns:
an encoder

getReaderNoHeader

public static PackedInts.Reader getReaderNoHeader(DataInput in,
                                                  PackedInts.Format format,
                                                  int version,
                                                  int valueCount,
                                                  int bitsPerValue)
                                           throws IOException
Expert: Restore a PackedInts.Reader from a stream without reading metadata at the beginning of the stream. This method is useful to restore data from streams which have been created using getWriterNoHeader(DataOutput, Format, int, int, int).

Parameters:
in - the stream to read data from, positioned at the beginning of the packed values
format - the format used to serialize
version - the version used to serialize the data
valueCount - how many values the stream holds
bitsPerValue - the number of bits per value
Returns:
a Reader
Throws:
IOException - If there is a low-level I/O error
See Also:
getWriterNoHeader(DataOutput, Format, int, int, int)
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getReaderNoHeader

public static PackedInts.Reader getReaderNoHeader(DataInput in,
                                                  PackedInts.Header header)
                                           throws IOException
Expert: Restore a PackedInts.Reader from a stream without reading metadata at the beginning of the stream. This method is useful to restore data when metadata has been previously read using readHeader(DataInput).

Parameters:
in - the stream to read data from, positioned at the beginning of the packed values
header - metadata result from readHeader()
Returns:
a Reader
Throws:
IOException - If there is a low-level I/O error
See Also:
readHeader(DataInput)
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getReader

public static PackedInts.Reader getReader(DataInput in)
                                   throws IOException
Restore a PackedInts.Reader from a stream.

Parameters:
in - the stream to read data from
Returns:
a Reader
Throws:
IOException - If there is a low-level I/O error
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getReaderIteratorNoHeader

public static PackedInts.ReaderIterator getReaderIteratorNoHeader(DataInput in,
                                                                  PackedInts.Format format,
                                                                  int version,
                                                                  int valueCount,
                                                                  int bitsPerValue,
                                                                  int mem)
Expert: Restore a PackedInts.ReaderIterator from a stream without reading metadata at the beginning of the stream. This method is useful to restore data from streams which have been created using getWriterNoHeader(DataOutput, Format, int, int, int).

Parameters:
in - the stream to read data from, positioned at the beginning of the packed values
format - the format used to serialize
version - the version used to serialize the data
valueCount - how many values the stream holds
bitsPerValue - the number of bits per value
mem - how much memory the iterator is allowed to use to read-ahead (likely to speed up iteration)
Returns:
a ReaderIterator
See Also:
getWriterNoHeader(DataOutput, Format, int, int, int)
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getReaderIterator

public static PackedInts.ReaderIterator getReaderIterator(DataInput in,
                                                          int mem)
                                                   throws IOException
Retrieve PackedInts as a PackedInts.ReaderIterator

Parameters:
in - positioned at the beginning of a stored packed int structure.
mem - how much memory the iterator is allowed to use to read-ahead (likely to speed up iteration)
Returns:
an iterator to access the values
Throws:
IOException - if the structure could not be retrieved.
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getDirectReaderNoHeader

public static PackedInts.Reader getDirectReaderNoHeader(IndexInput in,
                                                        PackedInts.Format format,
                                                        int version,
                                                        int valueCount,
                                                        int bitsPerValue)
Expert: Construct a direct PackedInts.Reader from a stream without reading metadata at the beginning of the stream. This method is useful to restore data from streams which have been created using getWriterNoHeader(DataOutput, Format, int, int, int).

The returned reader will have very little memory overhead, but every call to PackedInts.Reader.get(int) is likely to perform a disk seek.

Parameters:
in - the stream to read data from
format - the format used to serialize
version - the version used to serialize the data
valueCount - how many values the stream holds
bitsPerValue - the number of bits per value
Returns:
a direct Reader
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getDirectReaderNoHeader

public static PackedInts.Reader getDirectReaderNoHeader(IndexInput in,
                                                        PackedInts.Header header)
                                                 throws IOException
Expert: Construct a direct PackedInts.Reader from an IndexInput without reading metadata at the beginning of the stream. This method is useful to restore data when metadata has been previously read using readHeader(DataInput).

Parameters:
in - the stream to read data from, positioned at the beginning of the packed values
header - metadata result from readHeader()
Returns:
a Reader
Throws:
IOException - If there is a low-level I/O error
See Also:
readHeader(DataInput)
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getDirectReader

public static PackedInts.Reader getDirectReader(IndexInput in)
                                         throws IOException
Construct a direct PackedInts.Reader from an IndexInput. This method is useful to restore data from streams which have been created using getWriter(DataOutput, int, int, float).

The returned reader will have very little memory overhead, but every call to PackedInts.Reader.get(int) is likely to perform a disk seek.

Parameters:
in - the stream to read data from
Returns:
a direct Reader
Throws:
IOException - If there is a low-level I/O error
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getMutable

public static PackedInts.Mutable getMutable(int valueCount,
                                            int bitsPerValue,
                                            float acceptableOverheadRatio)
Create a packed integer array with the given amount of values initialized to 0. the valueCount and the bitsPerValue cannot be changed after creation. All Mutables known by this factory are kept fully in RAM.

Positive values of acceptableOverheadRatio will trade space for speed by selecting a faster but potentially less memory-efficient implementation. An acceptableOverheadRatio of COMPACT will make sure that the most memory-efficient implementation is selected whereas FASTEST will make sure that the fastest implementation is selected.

Parameters:
valueCount - the number of elements
bitsPerValue - the number of bits available for any given value
acceptableOverheadRatio - an acceptable overhead ratio per value
Returns:
a mutable packed integer array
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getWriterNoHeader

public static PackedInts.Writer getWriterNoHeader(DataOutput out,
                                                  PackedInts.Format format,
                                                  int valueCount,
                                                  int bitsPerValue,
                                                  int mem)
Expert: Create a packed integer array writer for the given output, format, value count, and number of bits per value.

The resulting stream will be long-aligned. This means that depending on the format which is used, up to 63 bits will be wasted. An easy way to make sure that no space is lost is to always use a valueCount that is a multiple of 64.

This method does not write any metadata to the stream, meaning that it is your responsibility to store it somewhere else in order to be able to recover data from the stream later on:

It is possible to start writing values without knowing how many of them you are actually going to write. To do this, just pass -1 as valueCount. On the other hand, for any positive value of valueCount, the returned writer will make sure that you don't write more values than expected and pad the end of stream with zeros in case you have written less than valueCount when calling PackedInts.Writer.finish().

The mem parameter lets you control how much memory can be used to buffer changes in memory before flushing to disk. High values of mem are likely to improve throughput. On the other hand, if speed is not that important to you, a value of 0 will use as little memory as possible and should already offer reasonable throughput.

Parameters:
out - the data output
format - the format to use to serialize the values
valueCount - the number of values
bitsPerValue - the number of bits per value
mem - how much memory (in bytes) can be used to speed up serialization
Returns:
a Writer
See Also:
getReaderIteratorNoHeader(DataInput, Format, int, int, int, int), getReaderNoHeader(DataInput, Format, int, int, int)
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getWriter

public static PackedInts.Writer getWriter(DataOutput out,
                                          int valueCount,
                                          int bitsPerValue,
                                          float acceptableOverheadRatio)
                                   throws IOException
Create a packed integer array writer for the given output, format, value count, and number of bits per value.

The resulting stream will be long-aligned. This means that depending on the format which is used under the hoods, up to 63 bits will be wasted. An easy way to make sure that no space is lost is to always use a valueCount that is a multiple of 64.

This method writes metadata to the stream, so that the resulting stream is sufficient to restore a PackedInts.Reader from it. You don't need to track valueCount or bitsPerValue by yourself. In case this is a problem, you should probably look at getWriterNoHeader(DataOutput, Format, int, int, int).

The acceptableOverheadRatio parameter controls how readers that will be restored from this stream trade space for speed by selecting a faster but potentially less memory-efficient implementation. An acceptableOverheadRatio of COMPACT will make sure that the most memory-efficient implementation is selected whereas FASTEST will make sure that the fastest implementation is selected. In case you are only interested in reading this stream sequentially later on, you should probably use COMPACT.

Parameters:
out - the data output
valueCount - the number of values
bitsPerValue - the number of bits per value
acceptableOverheadRatio - an acceptable overhead ratio per value
Returns:
a Writer
Throws:
IOException - If there is a low-level I/O error
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

bitsRequired

public static int bitsRequired(long maxValue)
Returns how many bits are required to hold values up to and including maxValue

Parameters:
maxValue - the maximum value that should be representable.
Returns:
the amount of bits needed to represent values from 0 to maxValue.
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

maxValue

public static long maxValue(int bitsPerValue)
Calculates the maximum unsigned long that can be expressed with the given number of bits.

Parameters:
bitsPerValue - the number of bits available for any given value.
Returns:
the maximum value for the given bits.
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

copy

public static void copy(PackedInts.Reader src,
                        int srcPos,
                        PackedInts.Mutable dest,
                        int destPos,
                        int len,
                        int mem)
Copy src[srcPos:srcPos+len] into dest[destPos:destPos+len] using at most mem bytes.


readHeader

public static PackedInts.Header readHeader(DataInput in)
                                    throws IOException
Expert: reads only the metadata from a stream. This is useful to later restore a stream or open a direct reader via getReaderNoHeader(DataInput, Header) or getDirectReaderNoHeader(IndexInput, Header).

Parameters:
in - the stream to read data
Returns:
packed integer metadata.
Throws:
IOException - If there is a low-level I/O error
See Also:
getReaderNoHeader(DataInput, Header), getDirectReaderNoHeader(IndexInput, Header)


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.