org.apache.lucene.util
Class BytesRefHash

java.lang.Object
  extended by org.apache.lucene.util.BytesRefHash

public final class BytesRefHash
extends Object

BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Nested Class Summary
static class BytesRefHash.BytesStartArray
          Manages allocation of the per-term addresses.
static class BytesRefHash.DirectBytesStartArray
          A simple BytesRefHash.BytesStartArray that tracks memory allocation using a private Counter instance.
static class BytesRefHash.MaxBytesLengthExceededException
          Thrown if a BytesRef exceeds the BytesRefHash limit of ByteBlockPool.BYTE_BLOCK_SIZE-2.
 
Field Summary
static int DEFAULT_CAPACITY
           
 
Constructor Summary
BytesRefHash()
          Creates a new BytesRefHash with a ByteBlockPool using a ByteBlockPool.DirectAllocator.
BytesRefHash(ByteBlockPool pool)
          Creates a new BytesRefHash
BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
          Creates a new BytesRefHash
 
Method Summary
 int add(BytesRef bytes)
          Adds a new BytesRef
 int add(BytesRef bytes, int code)
          Adds a new BytesRef with a pre-calculated hash code.
 int addByPoolOffset(int offset)
          Adds a "arbitrary" int offset instead of a BytesRef term.
 int byteStart(int bytesID)
          Returns the bytesStart offset into the internally used ByteBlockPool for the given bytesID
 void clear()
           
 void clear(boolean resetPool)
          Clears the BytesRef which maps to the given BytesRef
 void close()
          Closes the BytesRefHash and releases all internally used memory
 int find(BytesRef bytes)
          Returns the id of the given BytesRef.
 int find(BytesRef bytes, int code)
          Returns the id of the given BytesRef with a pre-calculated hash code.
 BytesRef get(int bytesID, BytesRef ref)
          Populates and returns a BytesRef with the bytes for the given bytesID.
 void reinit()
          reinitializes the BytesRefHash after a previous clear() call.
 int size()
          Returns the number of BytesRef values in this BytesRefHash.
 int[] sort(Comparator<BytesRef> comp)
          Returns the values array sorted by the referenced byte values.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_CAPACITY

public static final int DEFAULT_CAPACITY
See Also:
Constant Field Values
Constructor Detail

BytesRefHash

public BytesRefHash()
Creates a new BytesRefHash with a ByteBlockPool using a ByteBlockPool.DirectAllocator.


BytesRefHash

public BytesRefHash(ByteBlockPool pool)
Creates a new BytesRefHash


BytesRefHash

public BytesRefHash(ByteBlockPool pool,
                    int capacity,
                    BytesRefHash.BytesStartArray bytesStartArray)
Creates a new BytesRefHash

Method Detail

size

public int size()
Returns the number of BytesRef values in this BytesRefHash.

Returns:
the number of BytesRef values in this BytesRefHash.

get

public BytesRef get(int bytesID,
                    BytesRef ref)
Populates and returns a BytesRef with the bytes for the given bytesID.

Note: the given bytesID must be a positive integer less than the current size (size())

Parameters:
bytesID - the id
ref - the BytesRef to populate
Returns:
the given BytesRef instance populated with the bytes for the given bytesID

sort

public int[] sort(Comparator<BytesRef> comp)
Returns the values array sorted by the referenced byte values.

Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

Parameters:
comp - the Comparator used for sorting

clear

public void clear(boolean resetPool)
Clears the BytesRef which maps to the given BytesRef


clear

public void clear()

close

public void close()
Closes the BytesRefHash and releases all internally used memory


add

public int add(BytesRef bytes)
Adds a new BytesRef

Parameters:
bytes - the bytes to hash
Returns:
the id the given bytes are hashed if there was no mapping for the given bytes, otherwise (-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before.
Throws:
BytesRefHash.MaxBytesLengthExceededException - if the given bytes are > 2 + ByteBlockPool.BYTE_BLOCK_SIZE

add

public int add(BytesRef bytes,
               int code)
Adds a new BytesRef with a pre-calculated hash code.

Parameters:
bytes - the bytes to hash
code - the bytes hash code

Hashcode is defined as:

 int hash = 0;
 for (int i = offset; i < offset + length; i++) {
   hash = 31 * hash + bytes[i];
 }
 
Returns:
the id the given bytes are hashed if there was no mapping for the given bytes, otherwise (-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before.
Throws:
BytesRefHash.MaxBytesLengthExceededException - if the given bytes are > ByteBlockPool.BYTE_BLOCK_SIZE - 2

find

public int find(BytesRef bytes)
Returns the id of the given BytesRef.

See Also:
find(BytesRef, int)

find

public int find(BytesRef bytes,
                int code)
Returns the id of the given BytesRef with a pre-calculated hash code.

Parameters:
bytes - the bytes to look for
code - the bytes hash code
Returns:
the id of the given bytes, or -1 if there is no mapping for the given bytes.

addByPoolOffset

public int addByPoolOffset(int offset)
Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.


reinit

public void reinit()
reinitializes the BytesRefHash after a previous clear() call. If clear() has not been called previously this method has no effect.


byteStart

public int byteStart(int bytesID)
Returns the bytesStart offset into the internally used ByteBlockPool for the given bytesID

Parameters:
bytesID - the id to look up
Returns:
the bytesStart offset into the internally used ByteBlockPool for the given id


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.