Class BytesRefHash

  • All Implemented Interfaces:
    Accountable

    public final class BytesRefHash
    extends Object
    implements Accountable
    BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

    Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

    NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
    • Method Detail

      • get

        public BytesRef get​(int bytesID,
                            BytesRef ref)
        Populates and returns a BytesRef with the bytes for the given bytesID.

        Note: the given bytesID must be a positive integer less than the current size (size())

        Parameters:
        bytesID - the id
        ref - the BytesRef to populate
        Returns:
        the given BytesRef instance populated with the bytes for the given bytesID
      • compact

        public int[] compact()
        Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit of size() - 1

        Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

        NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
      • sort

        public int[] sort()
        Returns the values array sorted by the referenced byte values.

        Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

      • clear

        public void clear​(boolean resetPool)
        Clears the BytesRef which maps to the given BytesRef
      • clear

        public void clear()
      • close

        public void close()
        Closes the BytesRefHash and releases all internally used memory
      • find

        public int find​(BytesRef bytes)
        Returns the id of the given BytesRef.
        Parameters:
        bytes - the bytes to look for
        Returns:
        the id of the given bytes, or -1 if there is no mapping for the given bytes.
      • addByPoolOffset

        public int addByPoolOffset​(int offset)
        Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
      • reinit

        public void reinit()
        reinitializes the BytesRefHash after a previous clear() call. If clear() has not been called previously this method has no effect.
      • byteStart

        public int byteStart​(int bytesID)
        Returns the bytesStart offset into the internally used ByteBlockPool for the given bytesID
        Parameters:
        bytesID - the id to look up
        Returns:
        the bytesStart offset into the internally used ByteBlockPool for the given id
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
        Specified by:
        ramBytesUsed in interface Accountable