SolrCloud Autoscaling Fault Tolerance

Autoscaling is deprecated

The autoscaling framework in its current form is deprecated and will be removed in Solr 9.0.

A new design for this feature is currently under development in SOLR-14613 with a goal for release with Solr 9.0.

The autoscaling framework uses a few strategies to ensure it’s able to still trigger actions in the event of unexpected changes to the system.

Node Added or Lost Markers

Since triggers execute on the node that runs the Overseer, should the Overseer node go down the nodeLost event would be lost because there would be no mechanism to generate it. Similarly, if a node has been added before the Overseer leader change was completed, the nodeAdded event would not be generated.

For this reason Solr implements additional mechanisms to ensure that these events are generated reliably.

With standard SolrCloud behavior, when a node joins a cluster its presence is marked as an ephemeral ZooKeeper path in the /live_nodes/<nodeName> ZooKeeper directory. Now an ephemeral path is also created under /autoscaling/nodeAdded/<nodeName>. When a new instance of Overseer leader is started it will run the nodeAdded trigger (if it’s configured) and discover the presence of this ZooKeeper path, at which point it will remove it and generate a nodeAdded event.

When a node leaves the cluster, up to three remaining nodes will try to create a persistent ZooKeeper path /autoscaling/nodeLost/<nodeName> and eventually one of them succeeds. When a new instance of Overseer leader is started it will run the nodeLost trigger (if it’s configured) and discover the presence of this ZooKeeper path, at which point it will remove it and generate a nodeLost event.

Trigger State Checkpointing

Triggers generate events based on their internal state. If the Overseer leader goes down while the trigger is about to generate a new event, it’s likely that the event would be lost because a new trigger instance running on the new Overseer leader would start from a clean slate.

For this reason, after each time a trigger is executed its internal state is persisted to ZooKeeper, and on Overseer start its internal state is restored.

Trigger Event Queues

Autoscaling framework limits the rate at which events are processed using several different mechanisms. One is the locking mechanism that prevents concurrent processing of events, and another is a single-threaded executor that runs trigger actions.

This means that the processing of an event may take significant time, and during this time it’s possible that the Overseer may go down. In order to avoid losing events that were already generated but not yet fully processed, events are queued before processing is started.

Separate ZooKeeper queues are created for each trigger, and events produced by triggers are put on these per-trigger queues. When a new Overseer leader is started it will first check these queues and process events accumulated there, and only then it will continue to run triggers normally. Queued events that fail processing during this "replay" stage are discarded.