Class PackedInts


  • public class PackedInts
    extends Object
    Simplistic compression for array of unsigned long values. Each value is >= 0 and <= a specified maximum value. The values are stored as packed ints, with each value consuming a fixed number of bits.
    NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
    • Field Detail

      • FASTEST

        public static final float FASTEST
        At most 700% memory overhead, always select a direct implementation.
        See Also:
        Constant Field Values
      • FAST

        public static final float FAST
        At most 50% memory overhead, always select a reasonably fast implementation.
        See Also:
        Constant Field Values
      • DEFAULT

        public static final float DEFAULT
        At most 25% memory overhead.
        See Also:
        Constant Field Values
      • COMPACT

        public static final float COMPACT
        No memory overhead at all, but the returned implementation may be slow.
        See Also:
        Constant Field Values
      • DEFAULT_BUFFER_SIZE

        public static final int DEFAULT_BUFFER_SIZE
        Default amount of memory to use for bulk operations.
        See Also:
        Constant Field Values
      • VERSION_MONOTONIC_WITHOUT_ZIGZAG

        public static final int VERSION_MONOTONIC_WITHOUT_ZIGZAG
        See Also:
        Constant Field Values
    • Constructor Detail

      • PackedInts

        public PackedInts()
    • Method Detail

      • checkVersion

        public static void checkVersion​(int version)
        Check the validity of a version number.
      • fastestFormatAndBits

        public static PackedInts.FormatAndBits fastestFormatAndBits​(int valueCount,
                                                                    int bitsPerValue,
                                                                    float acceptableOverheadRatio)
        Try to find the PackedInts.Format and number of bits per value that would restore from disk the fastest reader whose overhead is less than acceptableOverheadRatio.

        The acceptableOverheadRatio parameter makes sense for random-access PackedInts.Readers. In case you only plan to perform sequential access on this stream later on, you should probably use COMPACT.

        If you don't know how many values you are going to write, use valueCount = -1.

      • getDecoder

        public static PackedInts.Decoder getDecoder​(PackedInts.Format format,
                                                    int version,
                                                    int bitsPerValue)
        Parameters:
        format - the format used to store packed ints
        version - the compatibility version
        bitsPerValue - the number of bits per value
        Returns:
        a decoder
      • getEncoder

        public static PackedInts.Encoder getEncoder​(PackedInts.Format format,
                                                    int version,
                                                    int bitsPerValue)
        Parameters:
        format - the format used to store packed ints
        version - the compatibility version
        bitsPerValue - the number of bits per value
        Returns:
        an encoder
      • getReaderIteratorNoHeader

        public static PackedInts.ReaderIterator getReaderIteratorNoHeader​(DataInput in,
                                                                          PackedInts.Format format,
                                                                          int version,
                                                                          int valueCount,
                                                                          int bitsPerValue,
                                                                          int mem)
        Expert: Restore a PackedInts.ReaderIterator from a stream without reading metadata at the beginning of the stream. This method is useful to restore data from streams which have been created using getWriterNoHeader(DataOutput, Format, int, int, int).
        Parameters:
        in - the stream to read data from, positioned at the beginning of the packed values
        format - the format used to serialize
        version - the version used to serialize the data
        valueCount - how many values the stream holds
        bitsPerValue - the number of bits per value
        mem - how much memory the iterator is allowed to use to read-ahead (likely to speed up iteration)
        Returns:
        a ReaderIterator
        See Also:
        getWriterNoHeader(DataOutput, Format, int, int, int)
        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • getMutable

        public static PackedInts.Mutable getMutable​(int valueCount,
                                                    int bitsPerValue,
                                                    float acceptableOverheadRatio)
        Create a packed integer array with the given amount of values initialized to 0. the valueCount and the bitsPerValue cannot be changed after creation. All Mutables known by this factory are kept fully in RAM.

        Positive values of acceptableOverheadRatio will trade space for speed by selecting a faster but potentially less memory-efficient implementation. An acceptableOverheadRatio of COMPACT will make sure that the most memory-efficient implementation is selected whereas FASTEST will make sure that the fastest implementation is selected.

        Parameters:
        valueCount - the number of elements
        bitsPerValue - the number of bits available for any given value
        acceptableOverheadRatio - an acceptable overhead ratio per value
        Returns:
        a mutable packed integer array
        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • getMutable

        public static PackedInts.Mutable getMutable​(int valueCount,
                                                    int bitsPerValue,
                                                    PackedInts.Format format)
        Same as getMutable(int, int, float) with a pre-computed number of bits per value and format.
        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • getWriterNoHeader

        public static PackedInts.Writer getWriterNoHeader​(DataOutput out,
                                                          PackedInts.Format format,
                                                          int valueCount,
                                                          int bitsPerValue,
                                                          int mem)
        Expert: Create a packed integer array writer for the given output, format, value count, and number of bits per value.

        The resulting stream will be long-aligned. This means that depending on the format which is used, up to 63 bits will be wasted. An easy way to make sure that no space is lost is to always use a valueCount that is a multiple of 64.

        This method does not write any metadata to the stream, meaning that it is your responsibility to store it somewhere else in order to be able to recover data from the stream later on:

        It is possible to start writing values without knowing how many of them you are actually going to write. To do this, just pass -1 as valueCount. On the other hand, for any positive value of valueCount, the returned writer will make sure that you don't write more values than expected and pad the end of stream with zeros in case you have written less than valueCount when calling PackedInts.Writer.finish().

        The mem parameter lets you control how much memory can be used to buffer changes in memory before flushing to disk. High values of mem are likely to improve throughput. On the other hand, if speed is not that important to you, a value of 0 will use as little memory as possible and should already offer reasonable throughput.

        Parameters:
        out - the data output
        format - the format to use to serialize the values
        valueCount - the number of values
        bitsPerValue - the number of bits per value
        mem - how much memory (in bytes) can be used to speed up serialization
        Returns:
        a Writer
        See Also:
        getReaderIteratorNoHeader(DataInput, Format, int, int, int, int)
        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • bitsRequired

        public static int bitsRequired​(long maxValue)
        Returns how many bits are required to hold values up to and including maxValue NOTE: This method returns at least 1.
        Parameters:
        maxValue - the maximum value that should be representable.
        Returns:
        the amount of bits needed to represent values from 0 to maxValue.
        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • unsignedBitsRequired

        public static int unsignedBitsRequired​(long bits)
        Returns how many bits are required to store bits, interpreted as an unsigned value. NOTE: This method returns at least 1.
        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • maxValue

        public static long maxValue​(int bitsPerValue)
        Calculates the maximum unsigned long that can be expressed with the given number of bits.
        Parameters:
        bitsPerValue - the number of bits available for any given value.
        Returns:
        the maximum value for the given bits.
        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • copy

        public static void copy​(PackedInts.Reader src,
                                int srcPos,
                                PackedInts.Mutable dest,
                                int destPos,
                                int len,
                                int mem)
        Copy src[srcPos:srcPos+len] into dest[destPos:destPos+len] using at most mem bytes.