Class KMeans
java.lang.Object
org.apache.lucene.sandbox.codecs.quantization.KMeans
KMeans clustering algorithm for vectors
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Kmeans initialization methodsstatic final record
Results of KMeans clustering -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
static final int
static final int
static final int
-
Method Summary
Modifier and TypeMethodDescriptionstatic KMeans.Results
cluster
(FloatVectorValues vectors, int numClusters, boolean assignCentroidsToVectors, long seed, KMeans.KmeansInitializationMethod initializationMethod, boolean normalizeCenters, int restarts, int iters, int sampleSize) Expert: Cluster vectors into a given number of clustersstatic KMeans.Results
cluster
(FloatVectorValues vectors, VectorSimilarityFunction similarityFunction, int numClusters) Cluster vectors into a given number of clusters
-
Field Details
-
MAX_NUM_CENTROIDS
public static final int MAX_NUM_CENTROIDS- See Also:
-
DEFAULT_RESTARTS
public static final int DEFAULT_RESTARTS- See Also:
-
DEFAULT_ITRS
public static final int DEFAULT_ITRS- See Also:
-
DEFAULT_SAMPLE_SIZE
public static final int DEFAULT_SAMPLE_SIZE- See Also:
-
-
Method Details
-
cluster
public static KMeans.Results cluster(FloatVectorValues vectors, VectorSimilarityFunction similarityFunction, int numClusters) throws IOException Cluster vectors into a given number of clusters- Parameters:
vectors
- float vectorssimilarityFunction
- vector similarity function. For COSINE similarity, vectors must be normalized.numClusters
- number of cluster to cluster vector into- Returns:
- results of clustering: produced centroids and for each vector its centroid
- Throws:
IOException
- when if there is an error accessing vectors
-
cluster
public static KMeans.Results cluster(FloatVectorValues vectors, int numClusters, boolean assignCentroidsToVectors, long seed, KMeans.KmeansInitializationMethod initializationMethod, boolean normalizeCenters, int restarts, int iters, int sampleSize) throws IOException Expert: Cluster vectors into a given number of clusters- Parameters:
vectors
- float vectorsnumClusters
- number of cluster to cluster vector intoassignCentroidsToVectors
- iftrue
assign centroids for all vectors. Centroids are computed on a sample of vectors. If this parameter istrue
, in results also return for all vectors what centroids they belong to.seed
- random seedinitializationMethod
- Kmeans initialization methodnormalizeCenters
- for cosine distance, set to true, to use spherical k-means where centers are normalizedrestarts
- how many times to run Kmeans algorithmiters
- how many iterations to do within a single runsampleSize
- sample size to select from all vectors on which to run Kmeans algorithm- Returns:
- results of clustering: produced centroids and if
assignCentroidsToVectors == true
also for each vector its centroid - Throws:
IOException
- if there is error accessing vectors
-