Class VectorUtil

java.lang.Object
org.apache.lucene.util.VectorUtil

public final class VectorUtil extends Object
Utilities for computations with numeric arrays, especially algebraic operations like vector dot products. This class uses SIMD vectorization if the corresponding Java module is available and enabled. To enable vectorized code, pass --add-modules jdk.incubator.vector to Java's command line.

It will use CPU's FMA instructions if it is known to perform faster than separate multiply+add. This requires at least Hotspot C2 enabled, which is the default for OpenJDK based JVMs.

To explicitly disable or enable FMA usage, pass the following system properties:

  • -Dlucene.useScalarFMA=(auto|true|false) for scalar operations
  • -Dlucene.useVectorFMA=(auto|true|false) for vectorized operations (with vector incubator module)

The default is auto, which enables this for known CPU types and JVM settings. If Hotspot C2 is disabled, FMA and vectorization are not used.

Vectorization and FMA is only supported for Hotspot-based JVMs; it won't work on OpenJ9-based JVMs unless they provide HotSpotDiagnosticMXBean. Please also make sure that you have the jdk.management module enabled in modularized applications.

  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    add(float[] u, float[] v)
    Adds the second argument to the first
    static float[]
    checkFinite(float[] v)
    Checks if a float vector only has finite components.
    static float
    cosine(byte[] a, byte[] b)
    Returns the cosine similarity between the two vectors.
    static float
    cosine(float[] a, float[] b)
    Returns the cosine similarity between the two vectors.
    static int
    dotProduct(byte[] a, byte[] b)
    Dot product computed over signed bytes.
    static float
    dotProduct(float[] a, float[] b)
    Returns the vector dot product of the two vectors.
    static float
    dotProductScore(byte[] a, byte[] b)
    Dot product score computed over signed bytes, scaled to be in [0, 1].
    static int
    findNextGEQ(int[] buffer, int target, int from, int to)
    Given an array buffer that is sorted between indexes 0 inclusive and to exclusive, find the first array index whose value is greater than or equal to target.
    static long
    int4BitDotProduct(byte[] q, byte[] d)
    Dot product computed over int4 (values between [0,15]) bytes and a binary vector.
    static int
    int4DotProduct(byte[] a, byte[] b)
     
    static int
    int4DotProductPacked(byte[] unpacked, byte[] packed)
    Dot product computed over int4 (values between [0,15]) bytes.
    static boolean
    isUnitVector(float[] v)
     
    static float[]
    l2normalize(float[] v)
    Modifies the argument to be unit length, dividing by its l2-norm.
    static float[]
    l2normalize(float[] v, boolean throwOnZero)
    Modifies the argument to be unit length, dividing by its l2-norm.
    static float
    minMaxScalarQuantize(float[] vector, byte[] dest, float scale, float alpha, float minQuantile, float maxQuantile)
    Scalar quantizes vector, putting the result into dest.
    static float
    recalculateOffset(byte[] vector, float oldAlpha, float oldMinQuantile, float scale, float alpha, float minQuantile, float maxQuantile)
    Recalculates the offset for vector.
    static float
    scaleMaxInnerProductScore(float vectorDotProductSimilarity)
     
    static int
    squareDistance(byte[] a, byte[] b)
    Returns the sum of squared differences of the two vectors.
    static float
    squareDistance(float[] a, float[] b)
    Returns the sum of squared differences of the two vectors.
    static int
    xorBitCount(byte[] a, byte[] b)
    XOR bit count computed over signed bytes.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • dotProduct

      public static float dotProduct(float[] a, float[] b)
      Returns the vector dot product of the two vectors.
      Throws:
      IllegalArgumentException - if the vectors' dimensions differ.
    • cosine

      public static float cosine(float[] a, float[] b)
      Returns the cosine similarity between the two vectors.
      Throws:
      IllegalArgumentException - if the vectors' dimensions differ.
    • cosine

      public static float cosine(byte[] a, byte[] b)
      Returns the cosine similarity between the two vectors.
    • squareDistance

      public static float squareDistance(float[] a, float[] b)
      Returns the sum of squared differences of the two vectors.
      Throws:
      IllegalArgumentException - if the vectors' dimensions differ.
    • squareDistance

      public static int squareDistance(byte[] a, byte[] b)
      Returns the sum of squared differences of the two vectors.
    • l2normalize

      public static float[] l2normalize(float[] v)
      Modifies the argument to be unit length, dividing by its l2-norm. IllegalArgumentException is thrown for zero vectors.
      Returns:
      the input array after normalization
    • isUnitVector

      public static boolean isUnitVector(float[] v)
    • l2normalize

      public static float[] l2normalize(float[] v, boolean throwOnZero)
      Modifies the argument to be unit length, dividing by its l2-norm.
      Parameters:
      v - the vector to normalize
      throwOnZero - whether to throw an exception when v has all zeros
      Returns:
      the input array after normalization
      Throws:
      IllegalArgumentException - when the vector is all zero and throwOnZero is true
    • add

      public static void add(float[] u, float[] v)
      Adds the second argument to the first
      Parameters:
      u - the destination
      v - the vector to add to the destination
    • dotProduct

      public static int dotProduct(byte[] a, byte[] b)
      Dot product computed over signed bytes.
      Parameters:
      a - bytes containing a vector
      b - bytes containing another vector, of the same dimension
      Returns:
      the value of the dot product of the two vectors
    • int4DotProduct

      public static int int4DotProduct(byte[] a, byte[] b)
    • int4DotProductPacked

      public static int int4DotProductPacked(byte[] unpacked, byte[] packed)
      Dot product computed over int4 (values between [0,15]) bytes. The second vector is considered "packed" (i.e. every byte representing two values). The following packing is assumed:
         packed[0] = (raw[0] * 16) | raw[packed.length];
         packed[1] = (raw[1] * 16) | raw[packed.length + 1];
         ...
         packed[packed.length - 1] = (raw[packed.length - 1] * 16) | raw[2 * packed.length - 1];
       
      Parameters:
      unpacked - the unpacked vector, of even length
      packed - the packed vector, of length (unpacked.length + 1) / 2
      Returns:
      the value of the dot product of the two vectors
    • int4BitDotProduct

      public static long int4BitDotProduct(byte[] q, byte[] d)
      Dot product computed over int4 (values between [0,15]) bytes and a binary vector.
      Parameters:
      q - the int4 query vector
      d - the binary document vector
      Returns:
      the dot product
    • xorBitCount

      public static int xorBitCount(byte[] a, byte[] b)
      XOR bit count computed over signed bytes.
      Parameters:
      a - bytes containing a vector
      b - bytes containing another vector, of the same dimension
      Returns:
      the value of the XOR bit count of the two vectors
    • dotProductScore

      public static float dotProductScore(byte[] a, byte[] b)
      Dot product score computed over signed bytes, scaled to be in [0, 1].
      Parameters:
      a - bytes containing a vector
      b - bytes containing another vector, of the same dimension
      Returns:
      the value of the similarity function applied to the two vectors
    • scaleMaxInnerProductScore

      public static float scaleMaxInnerProductScore(float vectorDotProductSimilarity)
      Parameters:
      vectorDotProductSimilarity - the raw similarity between two vectors
      Returns:
      A scaled score preventing negative scores for maximum-inner-product
    • checkFinite

      public static float[] checkFinite(float[] v)
      Checks if a float vector only has finite components.
      Parameters:
      v - bytes containing a vector
      Returns:
      the vector for call-chaining
      Throws:
      IllegalArgumentException - if any component of vector is not finite
    • findNextGEQ

      public static int findNextGEQ(int[] buffer, int target, int from, int to)
      Given an array buffer that is sorted between indexes 0 inclusive and to exclusive, find the first array index whose value is greater than or equal to target. This index is guaranteed to be at least from. If there is no such array index, to is returned.
    • minMaxScalarQuantize

      public static float minMaxScalarQuantize(float[] vector, byte[] dest, float scale, float alpha, float minQuantile, float maxQuantile)
      Scalar quantizes vector, putting the result into dest.
      Parameters:
      vector - the vector to quantize
      dest - the destination vector
      scale - the scaling factor
      alpha - the alpha value
      minQuantile - the lower quantile of the distribution
      maxQuantile - the upper quantile of the distribution
      Returns:
      the corrective offset that needs to be applied to the score
    • recalculateOffset

      public static float recalculateOffset(byte[] vector, float oldAlpha, float oldMinQuantile, float scale, float alpha, float minQuantile, float maxQuantile)
      Recalculates the offset for vector.
      Parameters:
      vector - the vector to quantize
      oldAlpha - the previous alpha value
      oldMinQuantile - the previous lower quantile
      scale - the scaling factor
      alpha - the alpha value
      minQuantile - the lower quantile of the distribution
      maxQuantile - the upper quantile of the distribution
      Returns:
      the new corrective offset