CDCR Configuration

The Source and Target configurations differ in the case of the data centers being in separate clusters. "Cluster" here means separate ZooKeeper ensembles controlling disjoint Solr instances. Whether these data centers are physically separated or not is immaterial for this discussion.

As described in the section CDCR Architecture, two approaches are supported: uni-directional updates and bi-directional updates.

All CDCR configuration is done in the solrconfig.xml file. Because this is a per-collection configuration file, all CDCR configuration is done for each collection.

Uni-Directional Updates

Source Configuration

Here is a sample of a Source configuration file, a section in solrconfig.xml. The presence of the <replica> section causes CDCR to use this cluster as the Source and it should not be present in the Target collections. Details about each setting are after the two examples. The source example has buffering disabled, the default is enabled:

<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
  <lst name="replica">
    <str name="zkHost">10.240.18.211:2181,10.240.18.212:2181</str>
    <!--
    If you have chrooted your Solr information at the target you must include the chroot, for example:
    <str name="zkHost">10.240.18.211:2181,10.240.18.212:2181/solr</str>
    -->
    <str name="source">collection1</str>
    <str name="target">collection1</str>
  </lst>

  <lst name="replicator">
    <str name="threadPoolSize">8</str>
    <str name="schedule">1000</str>
    <str name="batchSize">128</str>
  </lst>

  <lst name="updateLogSynchronizer">
    <str name="schedule">1000</str>
  </lst>

</requestHandler>

<!-- Modify the <updateLog> section of your existing <updateHandler>
     in your config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
  <updateLog class="solr.CdcrUpdateLog">
    <str name="dir">${solr.ulog.dir:}</str>
    <!--Any parameters from the original <updateLog> section -->
  </updateLog>

  <!-- Other configuration options such as autoCommit should still be present -->
</updateHandler>

Target Configuration

Here is a typical Target configuration.

Target instance must configure an update processor chain that is specific to CDCR. The update processor chain must include the CdcrUpdateProcessorFactory. The task of this processor is to ensure that the version numbers attached to update requests coming from a CDCR Source SolrCloud are reused and not overwritten by the Target. A properly configured Target configuration looks similar to this:

<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
  <!-- recommended for Target clusters -->
  <lst name="buffer">
    <str name="defaultState">disabled</str>
  </lst>
</requestHandler>

<requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
    <str name="update.chain">cdcr-processor-chain</str>
  </lst>
</requestHandler>

<updateRequestProcessorChain name="cdcr-processor-chain">
  <processor class="solr.CdcrUpdateProcessorFactory"/>
  <processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

<!-- Modify the <updateLog> section of your existing <updateHandler> in your
    config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
  <updateLog class="solr.CdcrUpdateLog">
    <str name="dir">${solr.ulog.dir:}</str>
    <!--Any parameters from the original <updateLog> section -->
  </updateLog>

  <!-- Other configuration options such as autoCommit should still be present -->

</updateHandler>

Bi-Directional Updates

The configurations in both Cluster 1 and 2 are identical with respective zkHost string specified in each cluster’s solrconfig.xml.

Both Cluster 1 and Cluster 2 can act as Source and Target at any given point of time but a cluster cannot be both Source and Target at the same time.

Cluster 1 Configuration

Here is a sample of a Cluster 1 configuration file, a section in solrconfig.xml. Cluster 2 zkhost string is specified in a CdcrRequestHandler declaration:

<requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
    <str name="update.chain">cdcr-processor-chain</str>
  </lst>
</requestHandler>

<updateRequestProcessorChain name="cdcr-processor-chain">
  <processor class="solr.CdcrUpdateProcessorFactory"/>
  <processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
  <lst name="replica">
    <str name="zkHost">10.240.19.241:2181,10.240.19.242:2181</str>
    <!--
    If you have chrooted your Solr information at the target you must include the chroot, for example:
    <str name="zkHost">10.240.19.241:2181,10.240.19.242:2181/solr</str>
    -->
    <str name="source">collection1</str>
    <str name="target">collection1</str>
  </lst>

  <lst name="replicator">
    <str name="threadPoolSize">8</str>
    <str name="schedule">1000</str>
    <str name="batchSize">128</str>
  </lst>

  <lst name="updateLogSynchronizer">
    <str name="schedule">1000</str>

</requestHandler>

<!-- Modify the <updateLog> section of your existing <updateHandler>
     in your config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
  <updateLog class="solr.CdcrUpdateLog">
    <str name="dir">${solr.ulog.dir:}</str>
    <!--Any parameters from the original <updateLog> section -->
  </updateLog>
</updateHandler>

Cluster 2 Configuration

The configuration of the 2nd cluster is identical to the configuration of Cluster 1, with the Cluster 1 zkHost string specified in CdcrRequestHandler definition:

<requestHandler name="/update" class="solr.UpdateRequestHandler">
  <lst name="defaults">
    <str name="update.chain">cdcr-processor-chain</str>
  </lst>
</requestHandler>

<updateRequestProcessorChain name="cdcr-processor-chain">
  <processor class="solr.CdcrUpdateProcessorFactory"/>
  <processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>

<requestHandler name="/cdcr" class="solr.CdcrRequestHandler">
  <lst name="replica">
    <str name="zkHost">10.250.18.211:2181,10.250.18.212:2181</str>
    <!--
    If you have chrooted your Solr information at the target you must include the chroot, for example:
    <str name="zkHost">10.250.18.211:2181,10.250.18.212:2181/solr</str>
    -->
    <str name="source">collection1</str>
    <str name="target">collection1</str>
  </lst>

  <lst name="replicator">
    <str name="threadPoolSize">8</str>
    <str name="schedule">1000</str>
    <str name="batchSize">128</str>
  </lst>

  <lst name="updateLogSynchronizer">
    <str name="schedule">1000</str>
  </lst>

</requestHandler>

<!-- Modify the <updateLog> section of your existing <updateHandler>
     in your config as below -->
<updateHandler class="solr.DirectUpdateHandler2">
  <updateLog class="solr.CdcrUpdateLog">
    <str name="dir">${solr.ulog.dir:}</str>
    <!--Any parameters from the original <updateLog> section -->
  </updateLog>
</updateHandler>

CDCR Configuration Parameters

The configuration details, defaults and options are as follows:

The Replica Element

CDCR can be configured to forward update requests to one or more Target collections. A Target collection is defined with a “replica” list as follows:

zkHost
The host address for ZooKeeper of the Target SolrCloud. Usually this is a comma-separated list of addresses to each node in the Target ZooKeeper ensemble. This parameter is required.
Source
The name of the collection on the Source SolrCloud to be replicated. This parameter is required.
Target
The name of the collection on the Target SolrCloud to which updates will be forwarded. This parameter is required.

The Replicator Element

The CDC Replicator is the component in charge of forwarding updates to the replicas. The replicator will monitor the update logs of the Source collection and will forward any new updates to the Target collection.

The replicator uses a fixed thread pool to forward updates to multiple replicas in parallel. If more than one replica is configured, one thread will forward a batch of updates from one replica at a time in a round-robin fashion. The replicator can be configured with a “replicator” list as follows:

threadPoolSize
The number of threads to use for forwarding updates. One thread per replica is recommended. The default is 2.
schedule
The delay in milliseconds for the monitoring the update log(s). The default is 10.
batchSize
The number of updates to send in one batch. The optimal size depends on the size of the documents. Large batches of large documents can increase your memory usage significantly. The default is 128.

The updateLogSynchronizer Element

Expert: Non-leader nodes need to synchronize their update logs with their leader node from time to time in order to clean deprecated transaction log files. By default, such a synchronization process is performed every minute. The schedule of the synchronization can be modified with a “updateLogSynchronizer” list as follows:

If the updateLogSynchronizer element is omitted from the Source cluster, transaction logs may accumulate on non-leaders.
schedule
The delay in milliseconds for synchronizing the update logs. The default is 60000.

The Buffer Element

When buffering updates, the update logs will store all the updates indefinitely. It is best to disable buffering on both the Source and Target clusters during normal operation as when buffering is enabled the Update Logs will grow without limit. Enbling buffering is intended for special maintenance periods. Buffering can be disabled at startup with a “buffer” list and the parameter “defaultState” as follows:

defaultState
The state of the buffer at startup. The default is enabled.
Buffering should be enabled only for maintenance windows

Buffering is designed to augment maintenance windows. The following points should be kept in mind:

  • When buffering is enabled, the Update Logs will grow without limit; they will never be purged.
  • During normal operation, the Update Logs will automatically accrue on the Source data center if the Target data center is unavailable; It is not necessary to enable buffering for CDCR to handle routine network disruptions.
    • For this reason, monitoring disk usage on the Source data center is recommended as an additional check that the Target data center is receiving updates.
  • For uni-directional updates, buffering should not be enabled on the Target data center as Update Logs would accrue without limit.
  • If buffering is enabled and then disabled, the Update Logs will be removed when their contents have been sent to the Target data center. This process may take some time and is triggered by additional updates the Source cluster.
    • Update Log cleanup is not triggered until a new update is sent to the Source data center.

Initial Startup

Uni-Directional Approach

This is a general approach for initializing CDCR in a production environment. It’s based upon an approach taken by the initial working installation of CDCR and generously contributed to illustrate a "real world" scenario.

  • CDCR is used to keep a remote disaster-recovery instance available for production backup.
  • This example as 26 clouds with 200 million assets per cloud (15GB indexes). Total document count is over 4.8 billion.
    • Source and Target clouds were synched in 2-3 hour maintenance windows to establish the base index for the Targets.

As usual, it is good to start small. Sync a single cloud and monitor for a period of time before doing the others. You may need to adjust your settings several times before finding the right balance.

  • Before starting, stop or pause the indexers. This is best done during a small maintenance window.
  • Stop the SolrCloud instances at the Source.
  • Upload the modified solrconfig.xml to ZooKeeper on both Source and Target as appropriate, see the examples above.
  • Sync the index directories from the Source collection to Target collection across to the corresponding shard nodes. rsync works well for this.

    For example, if there are two shards on collection1 with 2 replicas for each shard, copy the corresponding index directories from:

    shard1replica1Sourcetoshard1replica1Target
    shard1replica2Sourcetoshard1replica2Target
    shard2replica1Sourcetoshard2replica1Target
    shard2replica2Sourcetoshard2replica2Target
  • Start ZooKeeper on the Target (DR).
  • Start SolrCloud on the Target (DR).
  • Start ZooKeeper on the Source.
  • Start SolrCloud on the Source. As a general rule, the Target (DR) should be started before the Source.
  • Activate CDCR on Source instance using the CDCR API:

    http://host:port/solr/<collection_name>/cdcr?action=START

    There is no need to run the /cdcr?action=START command on the Target.

  • Disable the buffer on the Target and Source:

    http://host:port/solr/collection_name/cdcr?action=DISABLEBUFFER
  • Re-enable indexing.

Bi-Directional Approach

When using the bi-directional approach, it is highly recommended to enable CDCR on both cluster-collections before any indexing has taken place.

Based on the same example from uni-directional solution, let’s walk through the necessary steps:

  • Before you begin, stop or pause any indexing processes. This is best done during a small maintenance window.
  • Stop the SolrCloud instances in both Cluster 1 and Cluster 2.
  • Upload the modified solrconfig.xml to ZooKeeper on both Cluster 1 and Cluster 2 as appropriate, see the examples above in the section Bi-Directional Updates.
  • If documents were indexed prior to this exercise, sync the index directories from the Cluster 1 collection to the Cluster 2 collection to the corresponding shard nodes or vice versa. The rsync utility works well for this if it’s available on your server. Check to be sure the the updated index is copied across.

    For example, if there are 2 shards on collection 'cluster1' (the updated collection) with 2 replicas for each shard, copy the corresponding index directories from:

    shard1replica1cluster1toshard1replica1cluster2
    shard1replica2cluster1toshard1replica2cluster2
    shard2replica1cluster1toshard2replica1cluster2
    shard2replica2cluster1toshard2replica2cluster2
  • Start ZooKeeper on Cluster 1.
  • Start ZooKeeper on Cluster 2.
  • Start SolrCloud on Cluster 1.
  • Start SolrCloud on Cluster 2.
  • If not present, create respective collections in both Cluster 1 and Cluster 2.
  • Activate the CDCR on Cluster 1 and Cluster 2 instance using the CDCR API:

    http://host:port/solr/<collection_name>/cdcr?action=START
  • Disable the buffer on Cluster 1 and Cluster 2:

    http://host:port/solr/collection_name/cdcr?action=DISABLEBUFFER
  • Re-enable indexing.

ZooKeeper Settings

With CDCR, the Target ZooKeepers will have connections from the Target clouds and the Source clouds. You may need to increase the maxClientCnxns setting in zoo.cfg.

## set numbers of connection to 800 from client
## is maxClientCnxns=0 that means no limit
maxClientCnxns=800