Class KnnVectorsReader

java.lang.Object
org.apache.lucene.codecs.KnnVectorsReader
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
FlatVectorsReader, Lucene99HnswVectorsReader, PerFieldKnnVectorsFormat.FieldsReader

public abstract class KnnVectorsReader extends Object implements Closeable
Reads vectors from an index.
  • Constructor Details

    • KnnVectorsReader

      protected KnnVectorsReader()
      Sole constructor
  • Method Details

    • checkIntegrity

      public abstract void checkIntegrity() throws IOException
      Checks consistency of this reader.

      Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files.

      Throws:
      IOException
      NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
    • getFloatVectorValues

      public abstract FloatVectorValues getFloatVectorValues(String field) throws IOException
      Returns the FloatVectorValues for the given field. The behavior is undefined if the given field doesn't have KNN vectors enabled on its FieldInfo. The return value is never null.
      Throws:
      IOException
    • getByteVectorValues

      public abstract ByteVectorValues getByteVectorValues(String field) throws IOException
      Returns the ByteVectorValues for the given field. The behavior is undefined if the given field doesn't have KNN vectors enabled on its FieldInfo. The return value is never null.
      Throws:
      IOException
    • search

      public abstract void search(String field, float[] target, KnnCollector knnCollector, AcceptDocs acceptDocs) throws IOException
      Return the k nearest neighbor documents as determined by comparison of their vector values for this field, to the given vector, by the field's similarity function. The score of each document is derived from the vector similarity in a way that ensures scores are positive and that a larger score corresponds to a higher ranking.

      The search is allowed to be approximate, meaning the results are not guaranteed to be the true k closest neighbors. For large values of k (for example when k is close to the total number of documents), the search may also retrieve fewer than k documents.

      The returned TopDocs will contain a ScoreDoc for each nearest neighbor, in order of their similarity to the query vector (decreasing scores). The TotalHits contains the number of documents visited during the search. If the search stopped early because it hit visitedLimit, it is indicated through the relation TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO.

      The behavior is undefined if the given field doesn't have KNN vectors enabled on its FieldInfo. The return value is never null.

      Parameters:
      field - the vector field to search
      target - the vector-valued query
      knnCollector - a KnnResults collector and relevant settings for gathering vector results
      acceptDocs - Bits that represents the allowed documents to match, or null if they are all allowed to match.
      Throws:
      IOException
    • search

      public abstract void search(String field, byte[] target, KnnCollector knnCollector, AcceptDocs acceptDocs) throws IOException
      Return the k nearest neighbor documents as determined by comparison of their vector values for this field, to the given vector, by the field's similarity function. The score of each document is derived from the vector similarity in a way that ensures scores are positive and that a larger score corresponds to a higher ranking.

      The search is allowed to be approximate, meaning the results are not guaranteed to be the true k closest neighbors. For large values of k (for example when k is close to the total number of documents), the search may also retrieve fewer than k documents.

      The returned TopDocs will contain a ScoreDoc for each nearest neighbor, in order of their similarity to the query vector (decreasing scores). The TotalHits contains the number of documents visited during the search. If the search stopped early because it hit visitedLimit, it is indicated through the relation TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO.

      The behavior is undefined if the given field doesn't have KNN vectors enabled on its FieldInfo. The return value is never null.

      Parameters:
      field - the vector field to search
      target - the vector-valued query
      knnCollector - a KnnResults collector and relevant settings for gathering vector results
      acceptDocs - Bits that represents the allowed documents to match, or null if they are all allowed to match.
      Throws:
      IOException
    • getMergeInstance

      public KnnVectorsReader getMergeInstance() throws IOException
      Returns an instance optimized for merging. This instance may only be consumed in the thread that called getMergeInstance().

      The default implementation returns this

      Throws:
      IOException
    • finishMerge

      public void finishMerge() throws IOException
      Optional: reset or close merge resources used in the reader

      The default implementation is empty

      Throws:
      IOException
    • getOffHeapByteSize

      public Map<String,Long> getOffHeapByteSize(FieldInfo fieldInfo)
      Returns the desired size of off-heap memory for the given field. This size can be used to help determine the memory requirements for optimal search performance, which can be greatly affected by page faults when not enough memory is available.

      For reporting purposes, the size of the off-heap index structures is broken down by their file extension, which provides a logical categorization of their purpose, e.g. the Lucene99HnswVectorsFormat stores the HNSW graph neighbours lists in a file with the "vex" extension.

      The long value is the size in bytes of the off-heap space needed if the associated index structure were to be fully loaded in memory. While somewhat analogous to Accountable.ramBytesUsed() (which reports actual on-heap memory usage), the sizes reported by this method are not actual usage but rather the amount of available memory needed to fully load the index into memory, rather than an actual RAM usage requirement.

      To determine the total desired off-heap memory size for the given field:

      
       getOffHeapByteSize(field).values().stream().mapToLong(Long::longValue).sum();
       

      The default implementation returns an empty map.

      Parameters:
      fieldInfo - the fieldInfo
      Returns:
      a map of the desired off-heap memory requirements by category
      WARNING: This API is experimental and might change in incompatible ways in the next release.
    • mergeOffHeapByteSizeMaps

      public static Map<String,Long> mergeOffHeapByteSizeMaps(Map<String,Long> map1, Map<String,Long> map2)
      Merges the maps returned by getOffHeapByteSize(FieldInfo).

      This method is a convenience for aggregating the desired off-heap memory requirements for several fields. The keys in the returned map are a union of the keys in the given maps. Entries with the same key are summed.

      WARNING: This API is experimental and might change in incompatible ways in the next release.