Class KnnVectorsReader
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
FlatVectorsReader
,Lucene99HnswVectorsReader
,PerFieldKnnVectorsFormat.FieldsReader
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionabstract void
Checks consistency of this reader.void
Optional: reset or close merge resources used in the readerabstract ByteVectorValues
getByteVectorValues
(String field) Returns theByteVectorValues
for the givenfield
.abstract FloatVectorValues
getFloatVectorValues
(String field) Returns theFloatVectorValues
for the givenfield
.Returns an instance optimized for merging.getOffHeapByteSize
(FieldInfo fieldInfo) Returns the desired size of off-heap memory for the given field.Merges the maps returned bygetOffHeapByteSize(FieldInfo)
.abstract void
search
(String field, byte[] target, KnnCollector knnCollector, AcceptDocs acceptDocs) Return the k nearest neighbor documents as determined by comparison of their vector values for this field, to the given vector, by the field's similarity function.abstract void
search
(String field, float[] target, KnnCollector knnCollector, AcceptDocs acceptDocs) Return the k nearest neighbor documents as determined by comparison of their vector values for this field, to the given vector, by the field's similarity function.
-
Constructor Details
-
KnnVectorsReader
protected KnnVectorsReader()Sole constructor
-
-
Method Details
-
checkIntegrity
Checks consistency of this reader.Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.
- Throws:
IOException
- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
getFloatVectorValues
Returns theFloatVectorValues
for the givenfield
. The behavior is undefined if the given field doesn't have KNN vectors enabled on itsFieldInfo
. The return value is nevernull
.- Throws:
IOException
-
getByteVectorValues
Returns theByteVectorValues
for the givenfield
. The behavior is undefined if the given field doesn't have KNN vectors enabled on itsFieldInfo
. The return value is nevernull
.- Throws:
IOException
-
search
public abstract void search(String field, float[] target, KnnCollector knnCollector, AcceptDocs acceptDocs) throws IOException Return the k nearest neighbor documents as determined by comparison of their vector values for this field, to the given vector, by the field's similarity function. The score of each document is derived from the vector similarity in a way that ensures scores are positive and that a larger score corresponds to a higher ranking.The search is allowed to be approximate, meaning the results are not guaranteed to be the true k closest neighbors. For large values of k (for example when k is close to the total number of documents), the search may also retrieve fewer than k documents.
The returned
TopDocs
will contain aScoreDoc
for each nearest neighbor, in order of their similarity to the query vector (decreasing scores). TheTotalHits
contains the number of documents visited during the search. If the search stopped early because it hitvisitedLimit
, it is indicated through the relationTotalHits.Relation.GREATER_THAN_OR_EQUAL_TO
.The behavior is undefined if the given field doesn't have KNN vectors enabled on its
FieldInfo
. The return value is nevernull
.- Parameters:
field
- the vector field to searchtarget
- the vector-valued queryknnCollector
- a KnnResults collector and relevant settings for gathering vector resultsacceptDocs
-Bits
that represents the allowed documents to match, ornull
if they are all allowed to match.- Throws:
IOException
-
search
public abstract void search(String field, byte[] target, KnnCollector knnCollector, AcceptDocs acceptDocs) throws IOException Return the k nearest neighbor documents as determined by comparison of their vector values for this field, to the given vector, by the field's similarity function. The score of each document is derived from the vector similarity in a way that ensures scores are positive and that a larger score corresponds to a higher ranking.The search is allowed to be approximate, meaning the results are not guaranteed to be the true k closest neighbors. For large values of k (for example when k is close to the total number of documents), the search may also retrieve fewer than k documents.
The returned
TopDocs
will contain aScoreDoc
for each nearest neighbor, in order of their similarity to the query vector (decreasing scores). TheTotalHits
contains the number of documents visited during the search. If the search stopped early because it hitvisitedLimit
, it is indicated through the relationTotalHits.Relation.GREATER_THAN_OR_EQUAL_TO
.The behavior is undefined if the given field doesn't have KNN vectors enabled on its
FieldInfo
. The return value is nevernull
.- Parameters:
field
- the vector field to searchtarget
- the vector-valued queryknnCollector
- a KnnResults collector and relevant settings for gathering vector resultsacceptDocs
-Bits
that represents the allowed documents to match, ornull
if they are all allowed to match.- Throws:
IOException
-
getMergeInstance
Returns an instance optimized for merging. This instance may only be consumed in the thread that calledgetMergeInstance()
.The default implementation returns
this
- Throws:
IOException
-
finishMerge
Optional: reset or close merge resources used in the readerThe default implementation is empty
- Throws:
IOException
-
getOffHeapByteSize
Returns the desired size of off-heap memory for the given field. This size can be used to help determine the memory requirements for optimal search performance, which can be greatly affected by page faults when not enough memory is available.For reporting purposes, the size of the off-heap index structures is broken down by their file extension, which provides a logical categorization of their purpose, e.g. the
Lucene99HnswVectorsFormat
stores the HNSW graph neighbours lists in a file with the "vex" extension.The long value is the size in bytes of the off-heap space needed if the associated index structure were to be fully loaded in memory. While somewhat analogous to
Accountable.ramBytesUsed()
(which reports actual on-heap memory usage), the sizes reported by this method are not actual usage but rather the amount of available memory needed to fully load the index into memory, rather than an actual RAM usage requirement.To determine the total desired off-heap memory size for the given field:
getOffHeapByteSize(field).values().stream().mapToLong(Long::longValue).sum();
The default implementation returns an empty map.
- Parameters:
fieldInfo
- the fieldInfo- Returns:
- a map of the desired off-heap memory requirements by category
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
mergeOffHeapByteSizeMaps
public static Map<String,Long> mergeOffHeapByteSizeMaps(Map<String, Long> map1, Map<String, Long> map2) Merges the maps returned bygetOffHeapByteSize(FieldInfo)
.This method is a convenience for aggregating the desired off-heap memory requirements for several fields. The keys in the returned map are a union of the keys in the given maps. Entries with the same key are summed.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-