org.apache.lucene.codecs.KnnVectorsFormat

org.apache.lucene.codecs.lucene99.Lucene99HnswScalarQuantizedVectorsFormat

All Implemented Interfaces:: NamedSPILoader.NamedSPI

public class Lucene99HnswScalarQuantizedVectorsFormat extends KnnVectorsFormat

Lucene 9.9 vector format, which encodes numeric vector values into an associated graph connecting the documents having values. The graph is used to power HNSW search. The format consists of two files, and uses Lucene99ScalarQuantizedVectorsFormat to store the actual vectors: For details on graph storage and file extensions, see Lucene99HnswVectorsFormat.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary

Fields

Modifier and Type

Field

Description

static final String

NAME

Fields inherited from class org.apache.lucene.codecs.KnnVectorsFormat
DEFAULT_MAX_DIMENSIONS, EMPTY
Constructor Summary

Constructors

Constructor

Description

Lucene99HnswScalarQuantizedVectorsFormat()

Constructs a format using default graph construction parameters with 7 bit quantization

Lucene99HnswScalarQuantizedVectorsFormat(int maxConn, int beamWidth)

Constructs a format using the given graph construction parameters with 7 bit quantization

Lucene99HnswScalarQuantizedVectorsFormat(int maxConn, int beamWidth, int numMergeWorkers, int bits, boolean compress, Float confidenceInterval, ExecutorService mergeExec)

Constructs a format using the given graph construction parameters and scalar quantization.
Method Summary

Modifier and Type

Method

Description

KnnVectorsReader

fieldsReader(SegmentReadState state)

Returns a KnnVectorsReader to read the vectors from the index.

KnnVectorsWriter

fieldsWriter(SegmentWriteState state)

Returns a KnnVectorsWriter to write the vectors to the index.

int

getMaxDimensions(String fieldName)

Returns the maximum number of vector dimensions supported by this codec for the given field name

String

toString()

Methods inherited from class org.apache.lucene.codecs.KnnVectorsFormat
availableKnnVectorsFormats, forName, getName, reloadKnnVectorsFormat

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- NAME
  
  public static final String NAME
  See Also:
  
  Constant Field Values
Constructor Details
- Lucene99HnswScalarQuantizedVectorsFormat
  
  public Lucene99HnswScalarQuantizedVectorsFormat()
  
  Constructs a format using default graph construction parameters with 7 bit quantization
- Lucene99HnswScalarQuantizedVectorsFormat
  
  public Lucene99HnswScalarQuantizedVectorsFormat(int maxConn, int beamWidth)
  
  Constructs a format using the given graph construction parameters with 7 bit quantization
  
  Parameters:
  
  maxConn - the maximum number of connections to a node in the HNSW graph
  
  beamWidth - the size of the queue maintained during graph construction.
- Lucene99HnswScalarQuantizedVectorsFormat
  
  public Lucene99HnswScalarQuantizedVectorsFormat(int maxConn, int beamWidth, int numMergeWorkers, int bits, boolean compress, Float confidenceInterval, ExecutorService mergeExec)
  
  Constructs a format using the given graph construction parameters and scalar quantization.
  
  Parameters:
  
  maxConn - the maximum number of connections to a node in the HNSW graph
  
  beamWidth - the size of the queue maintained during graph construction.
  
  numMergeWorkers - number of workers (threads) that will be used when doing merge. If larger than 1, a non-null ExecutorService must be passed as mergeExec
  
  bits - the number of bits to use for scalar quantization (must be 4 or 7)
  
  compress - whether to compress the quantized vectors by another 50% when bits=4. If `true`, pairs of (4 bit quantized) dimensions are packed into a single byte. This must be `false` when bits=7. This provides a trade-off of 50% reduction in hot vector memory usage during searching, at some decode speed penalty.
  
  confidenceInterval - the confidenceInterval for scalar quantizing the vectors, when `null` it is calculated based on the vector field dimensions. When `0`, the quantiles are dynamically determined by sampling many confidence intervals and determining the most accurate pair.
  
  mergeExec - the ExecutorService that will be used by ALL vector writers that are generated by this format to do the merge
Method Details
- fieldsWriter
  
  public KnnVectorsWriter fieldsWriter(SegmentWriteState state) throws IOException
  
  Description copied from class: KnnVectorsFormat
  
  Returns a KnnVectorsWriter to write the vectors to the index.
  
  Specified by:
  
  fieldsWriter in class KnnVectorsFormat
  
  Throws:
  
  IOException
- fieldsReader
  
  public KnnVectorsReader fieldsReader(SegmentReadState state) throws IOException
  
  Description copied from class: KnnVectorsFormat
  
  Returns a KnnVectorsReader to read the vectors from the index.
  
  Specified by:
  
  fieldsReader in class KnnVectorsFormat
  
  Throws:
  
  IOException
- getMaxDimensions
  
  public int getMaxDimensions(String fieldName)
  
  Description copied from class: KnnVectorsFormat
  
  Returns the maximum number of vector dimensions supported by this codec for the given field name
  Codecs implement this method to specify the maximum number of dimensions they support.
  
  Specified by:
  
  getMaxDimensions in class KnnVectorsFormat
  
  Parameters:
  
  fieldName - the field name
  
  Returns:
  
  the maximum number of vector dimensions.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class Lucene99HnswScalarQuantizedVectorsFormat

Field Summary

Fields inherited from class org.apache.lucene.codecs.KnnVectorsFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.codecs.KnnVectorsFormat

Methods inherited from class java.lang.Object

Field Details

NAME

Constructor Details

Lucene99HnswScalarQuantizedVectorsFormat

Lucene99HnswScalarQuantizedVectorsFormat

Lucene99HnswScalarQuantizedVectorsFormat

Method Details

fieldsWriter

fieldsReader

getMaxDimensions

toString