UpdateHandlers in SolrConfig

The settings in this section are configured in the <updateHandler> element in solrconfig.xml and may affect the performance of index updates. These settings affect how updates are done internally. <updateHandler> configurations do not affect the higher level configuration of RequestHandlers that process client update requests.

<updateHandler class="solr.DirectUpdateHandler2">
  ...
</updateHandler>

Commits

Data sent to Solr is not searchable until it has been committed to the index. The reason for this is that in some cases commits can be slow and they should be done in isolation from other possible commit requests to avoid overwriting data. So, it’s preferable to provide control over when data is committed. Several options are available to control the timing of commits.

commit and softCommit

In Solr, a commit is an action which asks Solr to "commit" those changes to the Lucene index files. By default commit actions result in a "hard commit" of all the Lucene index files to stable storage (disk). When a client includes a commit=true parameter with an update request, this ensures that all index segments affected by the adds & deletes on an update are written to disk as soon as index updates are completed.

If an additional flag softCommit=true is specified, then Solr performs a 'soft commit', meaning that Solr will commit your changes to the Lucene data structures quickly but not guarantee that the Lucene index files are written to stable storage. This is an implementation of Near Real Time storage, a feature that boosts document visibility, since you don’t have to wait for background merges and storage (to ZooKeeper, if using SolrCloud) to finish before moving on to something else. A full commit means that, if a server crashes, Solr will know exactly where your data was stored; a soft commit means that the data is stored, but the location information isn’t yet stored. The tradeoff is that a soft commit gives you faster visibility because it’s not waiting for background merges to finish.

For more information about Near Real Time operations, see Near Real Time Searching.

autoCommit

These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit is to use commitWithin, which can be defined when making the update request to Solr (i.e., when pushing documents), or in an update RequestHandler.

maxDocs
The number of updates that have occurred since the last commit.
maxTime
The number of milliseconds since the oldest uncommitted update.
maxSize
The maximum size of the transaction log (tlog) on disk, after which a hard commit is triggered. This is useful when the size of documents is unknown and the intention is to restrict the size of the transaction log to reasonable size. Valid values can be bytes (default with no suffix), kilobytes (if defined with a k suffix, as in 25k), megabytes (m) or gigabytes (g).
openSearcher
Whether to open a new searcher when performing a commit. If this is false, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible. The default is true.

If any of the maxDocs, maxTime, or maxSize limits are reached, Solr automatically performs a commit operation. If the autoCommit tag is missing, then only explicit commits will update the index. The decision whether to use auto-commit or not depends on the needs of your application.

Determining the best auto-commit settings is a tradeoff between performance and accuracy. Settings that cause frequent updates will improve the accuracy of searches because new content will be searchable more quickly, but performance may suffer because of the frequent updates. Less frequent updates may improve performance but it will take longer for updates to show up in queries.

<autoCommit>
  <maxDocs>10000</maxDocs>
  <maxTime>30000</maxTime>
  <maxSize>512m</maxSize>
  <openSearcher>false</openSearcher>
</autoCommit>

You can also specify 'soft' autoCommits in the same way that you can specify 'soft' commits, except that instead of using autoCommit you set the autoSoftCommit tag.

<autoSoftCommit>
  <maxTime>60000</maxTime>
</autoSoftCommit>

commitWithin

The commitWithin settings allow forcing document commits to happen in a defined time period. This is used most frequently with Near Real Time Searching, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to follower servers in a leader/follower environment. If that’s a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:

<commitWithin>
  <softCommit>false</softCommit>
</commitWithin>

With this configuration, when you call commitWithin as part of your update message, it will automatically perform a hard commit every time.

Event Listeners

The UpdateHandler section is also where update-related event listeners can be configured. These can be triggered to occur after any commit (event="postCommit") or only after optimize commands (event="postOptimize").

Users can write custom update event listener classes in Solr plugins. As of Solr 7.1, RunExecutableListener was removed for security reasons.

Transaction Log

As described in the section RealTime Get, a transaction log is required for that feature. It is configured in the updateHandler section of solrconfig.xml.

Realtime Get currently relies on the update log feature, which is enabled by default. It relies on an update log, which is configured in solrconfig.xml, in a section like:

<updateLog>
  <str name="dir">${solr.ulog.dir:}</str>
</updateLog>

Three additional expert-level configuration settings affect indexing performance and how far a replica can fall behind on updates before it must enter into full recovery - see the section on write side fault tolerance for more information:

numRecordsToKeep
The number of update records to keep per log. The default is 100.
maxNumLogsToKeep
The maximum number of logs keep. The default is 10.
numVersionBuckets
The number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires (8 bytes (long) * numVersionBuckets) of heap space per Solr core. The default is 65536.

An example, to be included under <config><updateHandler> in solrconfig.xml, employing the above advanced settings:

<updateLog>
  <str name="dir">${solr.ulog.dir:}</str>
  <int name="numRecordsToKeep">500</int>
  <int name="maxNumLogsToKeep">20</int>
  <int name="numVersionBuckets">65536</int>
</updateLog>

Other Options

In some cases complex updates (such as spatial/shape) may take very long time to complete. In the default configuration other updates that fall into the same internal version bucket will wait indefinitely and eventually these outstanding requests may pile up and lead to thread exhaustion and eventually to OutOfMemory errors.

The option versionBucketLockTimeoutMs in the updateHandler section helps to prevent that by specifying a limited timeout for such extremely long running update requests. If this limit is reached this update will fail but it won’t block forever all other updates. See SOLR-12833 for more details.

There’s a memory cost associated with this setting. Values greater than the default 0 (meaning unlimited timeout) cause Solr to use a different internal implementation of the version bucket, which increases memory consumption from ~1.5MB to ~6.8MB per Solr core.

An example of specifying this option under <config> section of solrconfig.xml:

<updateHandler class="solr.DirectUpdateHandler2">
  ...
  <int name="versionBucketLockTimeoutMs">10000</int>
</updateHandler>