SolrCloud Query Routing And Read Tolerance | Apache Solr Reference Guide 7.1

SolrCloud is highly available and fault tolerant in reads and writes.

Read Side Fault Tolerance

In a SolrCloud cluster each individual node load balances read requests across all the replicas in collection. You still need a load balancer on the 'outside' that talks to the cluster, or you need a smart client which understands how to read and interact with Solr’s metadata in ZooKeeper and only requests the ZooKeeper ensemble’s address to start discovering to which nodes it should send requests. (Solr provides a smart Java SolrJ client called CloudSolrClient.)

Even if some nodes in the cluster are offline or unreachable, a Solr node will be able to correctly respond to a search request as long as it can communicate with at least one replica of every shard, or one replica of every relevant shard if the user limited the search via the shards or _route_ parameters. The more replicas there are of every shard, the more likely that the Solr cluster will be able to handle search results in the event of node failures.

zkConnected

A Solr node will return the results of a search request as long as it can communicate with at least one replica of every shard that it knows about, even if it can not communicate with ZooKeeper at the time it receives the request. This is normally the preferred behavior from a fault tolerance standpoint, but may result in stale or incorrect results if there have been major changes to the collection structure that the node has not been informed of via ZooKeeper (i.e., shards may have been added or removed, or split into sub-shards)

A zkConnected header is included in every search response indicating if the node that processed the request was connected with ZooKeeper at the time:

Solr Response with partialResults

{
  "responseHeader": {
    "status": 0,
    "zkConnected": true,
    "QTime": 20,
    "params": {
      "q": "*:*"
    }
  },
  "response": {
    "numFound": 107,
    "start": 0,
    "docs": [ "..." ]
  }
}

shards.tolerant

In the event that one or more shards queried are completely unavailable, then Solr’s default behavior is to fail the request. However, there are many use-cases where partial results are acceptable and so Solr provides a boolean shards.tolerant parameter (default false).

If shards.tolerant=true then partial results may be returned. If the returned response does not contain results from all the appropriate shards then the response header contains a special flag called partialResults.

The client can specify 'shards.info' along with the shards.tolerant parameter to retrieve more fine-grained details.

Example response with partialResults flag set to 'true':

Solr Response with partialResults

{
  "responseHeader": {
    "status": 0,
    "zkConnected": true,
    "partialResults": true,
    "QTime": 20,
    "params": {
      "q": "*:*"
    }
  },
  "response": {
    "numFound": 77,
    "start": 0,
    "docs": [ "..." ]
  }
}

SolrCloud Recoveries and Write Tolerance SolrCloud Configuration and Parameters