Package org.apache.lucene.util
Class BytesRefHash
java.lang.Object
org.apache.lucene.util.BytesRefHash
- All Implemented Interfaces:
Accountable
BytesRefHash
is a special purpose hash-map like data-structure optimized for BytesRef
instances. BytesRefHash maintains mappings of byte arrays to ids
(Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping
to the id is encapsulated inside BytesRefHash
and is guaranteed to be increased for each
added BytesRef
.
Note: The maximum capacity BytesRef
instance passed to add(BytesRef)
must not
be longer than ByteBlockPool.BYTE_BLOCK_SIZE
-2. The internal storage is limited to 2GB
total byte storage.
- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Manages allocation of the per-term addresses.static class
A simpleBytesRefHash.BytesStartArray
that tracks memory allocation using a privateCounter
instance.static class
-
Field Summary
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
Constructor Summary
ConstructorDescriptionBytesRefHash
(ByteBlockPool pool) Creates a newBytesRefHash
BytesRefHash
(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray) Creates a newBytesRefHash
-
Method Summary
Modifier and TypeMethodDescriptionint
Adds a newBytesRef
int
addByPoolOffset
(int offset) Adds a "arbitrary" int offset instead of a BytesRef term.int
byteStart
(int bytesID) Returns the bytesStart offset into the internally usedByteBlockPool
for the given bytesIDvoid
clear()
void
clear
(boolean resetPool) void
close()
Closes the BytesRefHash and releases all internally used memoryint[]
compact()
Returns the ids array in arbitrary order.int
Returns the id of the givenBytesRef
.Populates and returns aBytesRef
with the bytes for the given bytesID.long
Return the memory usage of this object in bytes.void
reinit()
reinitializes theBytesRefHash
after a previousclear()
call.int
size()
Returns the number ofBytesRef
values in thisBytesRefHash
.int[]
sort()
Returns the values array sorted by the referenced byte values.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
Field Details
-
DEFAULT_CAPACITY
public static final int DEFAULT_CAPACITY- See Also:
-
-
Constructor Details
-
BytesRefHash
public BytesRefHash() -
BytesRefHash
Creates a newBytesRefHash
-
BytesRefHash
Creates a newBytesRefHash
-
-
Method Details
-
size
public int size()Returns the number ofBytesRef
values in thisBytesRefHash
.- Returns:
- the number of
BytesRef
values in thisBytesRefHash
.
-
get
Populates and returns aBytesRef
with the bytes for the given bytesID.Note: the given bytesID must be a positive integer less than the current size (
size()
)- Parameters:
bytesID
- the idref
- theBytesRef
to populate- Returns:
- the given BytesRef instance populated with the bytes for the given bytesID
-
compact
public int[] compact()Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit ofsize()
- 1Note: This is a destructive operation.
clear()
must be called in order to reuse thisBytesRefHash
instance.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
sort
public int[] sort()Returns the values array sorted by the referenced byte values.Note: This is a destructive operation.
clear()
must be called in order to reuse thisBytesRefHash
instance. -
clear
public void clear(boolean resetPool) -
clear
public void clear() -
close
public void close()Closes the BytesRefHash and releases all internally used memory -
add
Adds a newBytesRef
- Parameters:
bytes
- the bytes to hash- Returns:
- the id the given bytes are hashed if there was no mapping for the given bytes,
otherwise
(-(id)-1)
. This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. - Throws:
BytesRefHash.MaxBytesLengthExceededException
- if the given bytes are> 2 +
ByteBlockPool.BYTE_BLOCK_SIZE
-
find
Returns the id of the givenBytesRef
.- Parameters:
bytes
- the bytes to look for- Returns:
- the id of the given bytes, or
-1
if there is no mapping for the given bytes.
-
addByPoolOffset
public int addByPoolOffset(int offset) Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField. -
reinit
public void reinit()reinitializes theBytesRefHash
after a previousclear()
call. Ifclear()
has not been called previously this method has no effect. -
byteStart
public int byteStart(int bytesID) Returns the bytesStart offset into the internally usedByteBlockPool
for the given bytesID- Parameters:
bytesID
- the id to look up- Returns:
- the bytesStart offset into the internally used
ByteBlockPool
for the given id
-
ramBytesUsed
public long ramBytesUsed()Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-