java.lang.Object
org.apache.lucene.sandbox.codecs.quantization.KMeans

public class KMeans extends Object
KMeans clustering algorithm for vectors
  • Field Details

  • Method Details

    • cluster

      public static KMeans.Results cluster(FloatVectorValues vectors, VectorSimilarityFunction similarityFunction, int numClusters) throws IOException
      Cluster vectors into a given number of clusters
      Parameters:
      vectors - float vectors
      similarityFunction - vector similarity function. For COSINE similarity, vectors must be normalized.
      numClusters - number of cluster to cluster vector into
      Returns:
      results of clustering: produced centroids and for each vector its centroid
      Throws:
      IOException - when if there is an error accessing vectors
    • cluster

      public static KMeans.Results cluster(FloatVectorValues vectors, int numClusters, boolean assignCentroidsToVectors, long seed, KMeans.KmeansInitializationMethod initializationMethod, boolean normalizeCenters, int restarts, int iters, int sampleSize) throws IOException
      Expert: Cluster vectors into a given number of clusters
      Parameters:
      vectors - float vectors
      numClusters - number of cluster to cluster vector into
      assignCentroidsToVectors - if true assign centroids for all vectors. Centroids are computed on a sample of vectors. If this parameter is true, in results also return for all vectors what centroids they belong to.
      seed - random seed
      initializationMethod - Kmeans initialization method
      normalizeCenters - for cosine distance, set to true, to use spherical k-means where centers are normalized
      restarts - how many times to run Kmeans algorithm
      iters - how many iterations to do within a single run
      sampleSize - sample size to select from all vectors on which to run Kmeans algorithm
      Returns:
      results of clustering: produced centroids and if assignCentroidsToVectors == true also for each vector its centroid
      Throws:
      IOException - if there is error accessing vectors