Class BytesRefHash

java.lang.Object
org.apache.lucene.util.BytesRefHash
All Implemented Interfaces:
Accountable

public final class BytesRefHash extends Object implements Accountable
BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
  • Field Details

  • Constructor Details

  • Method Details

    • size

      public int size()
      Returns the number of BytesRef values in this BytesRefHash.
      Returns:
      the number of BytesRef values in this BytesRefHash.
    • get

      public BytesRef get(int bytesID, BytesRef ref)
      Populates and returns a BytesRef with the bytes for the given bytesID.

      Note: the given bytesID must be a positive integer less than the current size (size())

      Parameters:
      bytesID - the id
      ref - the BytesRef to populate
      Returns:
      the given BytesRef instance populated with the bytes for the given bytesID
    • compact

      public int[] compact()
      Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit of size() - 1

      Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

      NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
    • sort

      public int[] sort()
      Returns the values array sorted by the referenced byte values.

      Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

    • clear

      public void clear(boolean resetPool)
      Clears the BytesRef which maps to the given BytesRef
    • clear

      public void clear()
    • close

      public void close()
      Closes the BytesRefHash and releases all internally used memory
    • add

      public int add(BytesRef bytes)
      Adds a new BytesRef
      Parameters:
      bytes - the bytes to hash
      Returns:
      the id the given bytes are hashed if there was no mapping for the given bytes, otherwise (-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before.
      Throws:
      BytesRefHash.MaxBytesLengthExceededException - if the given bytes are > 2 + ByteBlockPool.BYTE_BLOCK_SIZE
    • find

      public int find(BytesRef bytes)
      Returns the id of the given BytesRef.
      Parameters:
      bytes - the bytes to look for
      Returns:
      the id of the given bytes, or -1 if there is no mapping for the given bytes.
    • addByPoolOffset

      public int addByPoolOffset(int offset)
      Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
    • reinit

      public void reinit()
      reinitializes the BytesRefHash after a previous clear() call. If clear() has not been called previously this method has no effect.
    • byteStart

      public int byteStart(int bytesID)
      Returns the bytesStart offset into the internally used ByteBlockPool for the given bytesID
      Parameters:
      bytesID - the id to look up
      Returns:
      the bytesStart offset into the internally used ByteBlockPool for the given id
    • ramBytesUsed

      public long ramBytesUsed()
      Description copied from interface: Accountable
      Return the memory usage of this object in bytes. Negative values are illegal.
      Specified by:
      ramBytesUsed in interface Accountable