Class MergingHnswGraphBuilder

java.lang.Object
org.apache.lucene.util.hnsw.HnswGraphBuilder
org.apache.lucene.util.hnsw.MergingHnswGraphBuilder
All Implemented Interfaces:
HnswBuilder

public final class MergingHnswGraphBuilder extends HnswGraphBuilder
A graph builder that is used during segments' merging.

This builder uses a smart algorithm to merge multiple graphs into a single graph. The algorithm is based on the idea that if we know where we want to insert a node, we have a good idea of where we want to insert its neighbors.

The algorithm is based on the following steps:

  • Get all graphs that don't have deletions and sort them by size desc.
  • Copy the largest graph to the new graph (gL).
  • For each remaining small graph (gS):
    • Find the nodes that best cover gS: join set `j`. These nodes will be inserted into gL as usual: by searching gL to find the best candidates `w` to which connect the nodes.
    • For each remaining node in gS:
      • We provide eps to search in gL. We form `eps` by the union of the node's neighbors in gS and the node's neighbors' neighbors in gL. We also limit beamWidth (efConstruction to M*3)

We expect the size of join set `j` to be small, around 1/5 to 1/2 of the size of gS. For the rest of the nodes in gS, we expect savings by performing lighter searches in gL.

WARNING: This API is experimental and might change in incompatible ways in the next release.
  • Method Details

    • fromGraphs

      public static MergingHnswGraphBuilder fromGraphs(RandomVectorScorerSupplier scorerSupplier, int M, int beamWidth, long seed, HnswGraph[] graphs, int[][] ordMaps, int totalNumberOfVectors, BitSet initializedNodes) throws IOException
      Create a new HnswGraphBuilder that is initialized with the provided HnswGraph.
      Parameters:
      scorerSupplier - the scorer to use for vectors
      beamWidth - the number of nodes to explore in the search
      seed - the seed for the random number generator
      graphs - the graphs to merge
      ordMaps - the ordinal maps for the graphs
      totalNumberOfVectors - the total number of vectors in the new graph, this should include all vectors expected to be added to the graph in the future
      initializedNodes - the nodes will be initialized through the merging
      Returns:
      a new HnswGraphBuilder that is initialized with the provided HnswGraph
      Throws:
      IOException - when reading the graph fails
    • build

      public OnHeapHnswGraph build(int maxOrd) throws IOException
      Description copied from interface: HnswBuilder
      Adds all nodes to the graph up to the provided maxOrd.
      Specified by:
      build in interface HnswBuilder
      Overrides:
      build in class HnswGraphBuilder
      Parameters:
      maxOrd - The maximum ordinal (excluded) of the nodes to be added.
      Throws:
      IOException