org.apache.lucene.util.VectorUtil

public final class VectorUtil extends Object

Utilities for computations with numeric arrays, especially algebraic operations like vector dot products. This class uses SIMD vectorization if the corresponding Java module is available and enabled. To enable vectorized code, pass --add-modules jdk.incubator.vector to Java's command line.

It will use CPU's FMA instructions if it is known to perform faster than separate multiply+add. This requires at least Hotspot C2 enabled, which is the default for OpenJDK based JVMs.

To explicitly disable or enable FMA usage, pass the following system properties:

-Dlucene.useScalarFMA=(auto|true|false) for scalar operations
-Dlucene.useVectorFMA=(auto|true|false) for vectorized operations (with vector incubator module)

The default is auto, which enables this for known CPU types and JVM settings. If Hotspot C2 is disabled, FMA and vectorization are not used.

Vectorization and FMA is only supported for Hotspot-based JVMs; it won't work on OpenJ9-based JVMs unless they provide HotSpotDiagnosticMXBean. Please also make sure that you have the jdk.management module enabled in modularized applications.

Method Summary

Modifier and Type

Method

Description

static void

add(float[] u, float[] v)

Adds the second argument to the first

static float[]

checkFinite(float[] v)

Checks if a float vector only has finite components.

static float

cosine(byte[] a, byte[] b)

Returns the cosine similarity between the two vectors.

static float

cosine(float[] a, float[] b)

Returns the cosine similarity between the two vectors.

static int

dotProduct(byte[] a, byte[] b)

Dot product computed over signed bytes.

static float

dotProduct(float[] a, float[] b)

Returns the vector dot product of the two vectors.

static float

dotProductScore(byte[] a, byte[] b)

Dot product score computed over signed bytes, scaled to be in [0, 1].

static int

findNextGEQ(int[] buffer, int target, int from, int to)

Given an array buffer that is sorted between indexes 0 inclusive and to exclusive, find the first array index whose value is greater than or equal to target.

static long

int4BitDotProduct(byte[] q, byte[] d)

Dot product computed over int4 (values between [0,15]) bytes and a binary vector.

static int

int4DotProduct(byte[] a, byte[] b)

static int

int4DotProductPacked(byte[] unpacked, byte[] packed)

Dot product computed over int4 (values between [0,15]) bytes.

static boolean

isUnitVector(float[] v)

static float[]

l2normalize(float[] v)

Modifies the argument to be unit length, dividing by its l2-norm.

static float[]

l2normalize(float[] v, boolean throwOnZero)

Modifies the argument to be unit length, dividing by its l2-norm.

static float

minMaxScalarQuantize(float[] vector, byte[] dest, float scale, float alpha, float minQuantile, float maxQuantile)

Scalar quantizes vector, putting the result into dest.

static float

recalculateOffset(byte[] vector, float oldAlpha, float oldMinQuantile, float scale, float alpha, float minQuantile, float maxQuantile)

Recalculates the offset for vector.

static float

scaleMaxInnerProductScore(float vectorDotProductSimilarity)

static int

squareDistance(byte[] a, byte[] b)

Returns the sum of squared differences of the two vectors.

static float

squareDistance(float[] a, float[] b)

Returns the sum of squared differences of the two vectors.

static int

xorBitCount(byte[] a, byte[] b)

XOR bit count computed over signed bytes.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- dotProduct
  
  public static float dotProduct(float[] a, float[] b)
  
  Returns the vector dot product of the two vectors.
  
  Throws:
  
  IllegalArgumentException - if the vectors' dimensions differ.
- cosine
  
  public static float cosine(float[] a, float[] b)
  
  Returns the cosine similarity between the two vectors.
  
  Throws:
  
  IllegalArgumentException - if the vectors' dimensions differ.
- cosine
  
  public static float cosine(byte[] a, byte[] b)
  
  Returns the cosine similarity between the two vectors.
- squareDistance
  
  public static float squareDistance(float[] a, float[] b)
  
  Returns the sum of squared differences of the two vectors.
  
  Throws:
  
  IllegalArgumentException - if the vectors' dimensions differ.
- squareDistance
  
  public static int squareDistance(byte[] a, byte[] b)
  
  Returns the sum of squared differences of the two vectors.
- l2normalize
  
  public static float[] l2normalize(float[] v)
  
  Modifies the argument to be unit length, dividing by its l2-norm. IllegalArgumentException is thrown for zero vectors.
  
  Returns:
  
  the input array after normalization
- isUnitVector
  
  public static boolean isUnitVector(float[] v)
- l2normalize
  
  public static float[] l2normalize(float[] v, boolean throwOnZero)
  
  Modifies the argument to be unit length, dividing by its l2-norm.
  
  Parameters:
  
  v - the vector to normalize
  
  throwOnZero - whether to throw an exception when v has all zeros
  
  Returns:
  
  the input array after normalization
  
  Throws:
  
  IllegalArgumentException - when the vector is all zero and throwOnZero is true
- add
  
  public static void add(float[] u, float[] v)
  
  Adds the second argument to the first
  
  Parameters:
  
  u - the destination
  
  v - the vector to add to the destination
- dotProduct
  
  public static int dotProduct(byte[] a, byte[] b)
  
  Dot product computed over signed bytes.
  
  Parameters:
  
  a - bytes containing a vector
  
  b - bytes containing another vector, of the same dimension
  
  Returns:
  
  the value of the dot product of the two vectors
- int4DotProduct
  
  public static int int4DotProduct(byte[] a, byte[] b)
- int4DotProductPacked
  
  public static int int4DotProductPacked(byte[] unpacked, byte[] packed)
  Dot product computed over int4 (values between [0,15]) bytes. The second vector is considered "packed" (i.e. every byte representing two values). The following packing is assumed:
  packed[0] = (raw[0] * 16) | raw[packed.length]; packed[1] = (raw[1] * 16) | raw[packed.length + 1]; ... packed[packed.length - 1] = (raw[packed.length - 1] * 16) | raw[2 * packed.length - 1];
  Parameters:
  
  unpacked - the unpacked vector, of even length
  
  packed - the packed vector, of length (unpacked.length + 1) / 2
  
  Returns:
  
  the value of the dot product of the two vectors
- int4BitDotProduct
  
  public static long int4BitDotProduct(byte[] q, byte[] d)
  
  Dot product computed over int4 (values between [0,15]) bytes and a binary vector.
  
  Parameters:
  
  q - the int4 query vector
  
  d - the binary document vector
  
  Returns:
  
  the dot product
- xorBitCount
  
  public static int xorBitCount(byte[] a, byte[] b)
  
  XOR bit count computed over signed bytes.
  
  Parameters:
  
  a - bytes containing a vector
  
  b - bytes containing another vector, of the same dimension
  
  Returns:
  
  the value of the XOR bit count of the two vectors
- dotProductScore
  
  public static float dotProductScore(byte[] a, byte[] b)
  
  Dot product score computed over signed bytes, scaled to be in [0, 1].
  
  Parameters:
  
  a - bytes containing a vector
  
  b - bytes containing another vector, of the same dimension
  
  Returns:
  
  the value of the similarity function applied to the two vectors
- scaleMaxInnerProductScore
  
  public static float scaleMaxInnerProductScore(float vectorDotProductSimilarity)
  
  Parameters:
  
  vectorDotProductSimilarity - the raw similarity between two vectors
  
  Returns:
  
  A scaled score preventing negative scores for maximum-inner-product
- checkFinite
  
  public static float[] checkFinite(float[] v)
  
  Checks if a float vector only has finite components.
  
  Parameters:
  
  v - bytes containing a vector
  
  Returns:
  
  the vector for call-chaining
  
  Throws:
  
  IllegalArgumentException - if any component of vector is not finite
- findNextGEQ
  
  public static int findNextGEQ(int[] buffer, int target, int from, int to)
  
  Given an array buffer that is sorted between indexes 0 inclusive and to exclusive, find the first array index whose value is greater than or equal to target. This index is guaranteed to be at least from. If there is no such array index, to is returned.
- minMaxScalarQuantize
  
  public static float minMaxScalarQuantize(float[] vector, byte[] dest, float scale, float alpha, float minQuantile, float maxQuantile)
  
  Scalar quantizes vector, putting the result into dest.
  
  Parameters:
  
  vector - the vector to quantize
  
  dest - the destination vector
  
  scale - the scaling factor
  
  alpha - the alpha value
  
  minQuantile - the lower quantile of the distribution
  
  maxQuantile - the upper quantile of the distribution
  
  Returns:
  
  the corrective offset that needs to be applied to the score
- recalculateOffset
  
  public static float recalculateOffset(byte[] vector, float oldAlpha, float oldMinQuantile, float scale, float alpha, float minQuantile, float maxQuantile)
  
  Recalculates the offset for vector.
  
  Parameters:
  
  vector - the vector to quantize
  
  oldAlpha - the previous alpha value
  
  oldMinQuantile - the previous lower quantile
  
  scale - the scaling factor
  
  alpha - the alpha value
  
  minQuantile - the lower quantile of the distribution
  
  maxQuantile - the upper quantile of the distribution
  
  Returns:
  
  the new corrective offset

Class VectorUtil

Method Summary

Methods inherited from class java.lang.Object

Method Details

dotProduct

cosine

cosine

squareDistance

squareDistance

l2normalize

isUnitVector

l2normalize

add

dotProduct

int4DotProduct

int4DotProductPacked

int4BitDotProduct

xorBitCount

dotProductScore

scaleMaxInnerProductScore

checkFinite

findNextGEQ

minMaxScalarQuantize

recalculateOffset