Package org.apache.lucene.util
Class BytesRefHash
- java.lang.Object
-
- org.apache.lucene.util.BytesRefHash
-
public final class BytesRefHash extends Object
BytesRefHashis a special purpose hash-map like data-structure optimized forBytesRefinstances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated insideBytesRefHashand is guaranteed to be increased for each addedBytesRef.Note: The maximum capacity
BytesRefinstance passed toadd(BytesRef)must not be longer thanByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBytesRefHash.BytesStartArrayManages allocation of the per-term addresses.static classBytesRefHash.DirectBytesStartArrayA simpleBytesRefHash.BytesStartArraythat tracks memory allocation using a privateCounterinstance.static classBytesRefHash.MaxBytesLengthExceededException
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_CAPACITY
-
Constructor Summary
Constructors Constructor Description BytesRefHash()BytesRefHash(ByteBlockPool pool)Creates a newBytesRefHashBytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)Creates a newBytesRefHash
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intadd(BytesRef bytes)Adds a newBytesRefintaddByPoolOffset(int offset)Adds a "arbitrary" int offset instead of a BytesRef term.intbyteStart(int bytesID)Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesIDvoidclear()voidclear(boolean resetPool)voidclose()Closes the BytesRefHash and releases all internally used memoryint[]compact()Returns the ids array in arbitrary order.intfind(BytesRef bytes)Returns the id of the givenBytesRef.BytesRefget(int bytesID, BytesRef ref)Populates and returns aBytesRefwith the bytes for the given bytesID.voidreinit()reinitializes theBytesRefHashafter a previousclear()call.intsize()Returns the number ofBytesRefvalues in thisBytesRefHash.int[]sort()Returns the values array sorted by the referenced byte values.
-
-
-
Field Detail
-
DEFAULT_CAPACITY
public static final int DEFAULT_CAPACITY
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BytesRefHash
public BytesRefHash()
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool)
Creates a newBytesRefHash
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
Creates a newBytesRefHash
-
-
Method Detail
-
size
public int size()
Returns the number ofBytesRefvalues in thisBytesRefHash.- Returns:
- the number of
BytesRefvalues in thisBytesRefHash.
-
get
public BytesRef get(int bytesID, BytesRef ref)
Populates and returns aBytesRefwith the bytes for the given bytesID.Note: the given bytesID must be a positive integer less than the current size (
size())- Parameters:
bytesID- the idref- theBytesRefto populate- Returns:
- the given BytesRef instance populated with the bytes for the given bytesID
-
compact
public int[] compact()
Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit ofsize()- 1Note: This is a destructive operation.
clear()must be called in order to reuse thisBytesRefHashinstance.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
sort
public int[] sort()
Returns the values array sorted by the referenced byte values.Note: This is a destructive operation.
clear()must be called in order to reuse thisBytesRefHashinstance.
-
clear
public void clear(boolean resetPool)
-
clear
public void clear()
-
close
public void close()
Closes the BytesRefHash and releases all internally used memory
-
add
public int add(BytesRef bytes)
Adds a newBytesRef- Parameters:
bytes- the bytes to hash- Returns:
- the id the given bytes are hashed if there was no mapping for the
given bytes, otherwise
(-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. - Throws:
BytesRefHash.MaxBytesLengthExceededException- if the given bytes are> 2 +ByteBlockPool.BYTE_BLOCK_SIZE
-
find
public int find(BytesRef bytes)
Returns the id of the givenBytesRef.- Parameters:
bytes- the bytes to look for- Returns:
- the id of the given bytes, or
-1if there is no mapping for the given bytes.
-
addByPoolOffset
public int addByPoolOffset(int offset)
Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
-
reinit
public void reinit()
reinitializes theBytesRefHashafter a previousclear()call. Ifclear()has not been called previously this method has no effect.
-
byteStart
public int byteStart(int bytesID)
Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesID- Parameters:
bytesID- the id to look up- Returns:
- the bytesStart offset into the internally used
ByteBlockPoolfor the given id
-
-