Time Routed Aliases

Time Routed Aliases (TRAs) is a SolrCloud feature that manages an alias and a time sequential series of collections.

It automatically creates new collections and (optionally) deletes old ones as it routes documents to the correct collection based on its timestamp. This approach allows for indefinite indexing of data without degradation of performance otherwise experienced due to the continuous growth of a single index.

If you need to store a lot of timestamped data in Solr, such as logs or IoT sensor data, then this feature probably makes more sense than creating one sharded hash-routed collection.

How It Works

First you create a time routed aliases using the CREATEALIAS command with some router settings. Most of the settings are editable at a later time using the ALIASPROP command.

The first collection will be created automatically, along with an alias pointing to it. Each underlying Solr "core" in a collection that is a member of a TRA has a special core property referencing the alias. The name of each collection is comprised of the TRA name and the start timestamp (UTC), with trailing zeros and symbols truncated. Ideally, as a user of this feature, you needn’t concern yourself with the particulars of the collection naming pattern since both queries and updates may be done via the alias.

When adding data, you should usually direct documents to the alias (e.g., reference the alias name instead of any collection). The Solr server and CloudSolrClient will direct an update request to the first collection that an alias points to.

The collections list for a TRA is always reverse sorted, and thus the connection path of the request will route to the lead collection. Using CloudSolrClient is preferable as it can reduce the number of underlying physical HTTP requests by one. If you know that a particular set of documents to be delivered is going to a particular older collection then you could direct it there from the client side as an optimization but it’s not necessary. CloudSolrClient does not (yet) do this.

When processing an update for a TRA, Solr initializes its UpdateRequestProcessor chain as usual, but when DistributedUpdateProcessor (DUP) initializes, it detects that the update targets a TRA and injects TimeRoutedUpdateProcessor (TRUP) in front of itself. TRUP, in coordination with the Overseer, is the main part of a TRA, and must immediately precede DUP. It is not possible to configure custom chains with other types of UpdateRequestProcessors between TRUP and DUP.

TRUP first reads TRA configuration from the alias properties when it is initialized. As it sees each document, it checks for changes to TRA properties, updates its cached configuration if needed and then determines which collection the document belongs to:

  • If TRUP needs to send it to a time segment represented by a collection other than the one that the client chose to communicate with, then it will do so using mechanisms shared with DUP. Once the document is forwarded to the correct collection (i.e., the correct TRA time segment), it skips directly to DUP on the target collection and continues normally, potentially being routed again to the correct shard & replica within the target collection.
  • If it belongs in the current collection (which is usually the case if processing events as they occur), the document passes through to DUP. DUP does it’s normal collection-level processing that may involve routing the document to another shard & replica.
  • If the time stamp on the document is more recent than the most recent TRA segment, then a new collection needs to be added at the front of the TRA. TRUP will create this collection, add it to the alias and then forward the document to the collection it just created. This can happen recursively if more than one collection needs to be created.

    Each time a new collection is added, the oldest collections in the TRA are examined for possible deletion, if that has been configured. All this happens synchronously, potentially adding seconds to the update request and indexing latency. If router.preemptiveCreateMath is configured and if the document arrives within this window then it will occur asynchronously.

Any other type of update like a commit or delete is routed by TRUP to all collections. Generally speaking, this is not a performance concern. When Solr receives a delete or commit wherein nothing is deleted or nothing needs to be committed, then it’s pretty cheap.

Improvement Possibilities

This is a new feature of SolrCloud that can be expected to be improved. Some potential areas for improvement that are not implemented yet are:

  • Searches with time filters should only go to applicable collections.
  • Collections ought to be constrained by their size instead of or in addition to time. Based on the underlying design, this would only apply to the lead collection.
  • Ways to automatically optimize (or reduce the resources of) older collections that aren’t expected to receive more updates, and might have less search demand.
  • CloudSolrClient could route documents to the correct collection based on a timestamp instead always picking the latest.
  • Compatibility with CDCR.

Limitations & Assumptions

  • Only time routed aliases are supported. If you instead have some other sequential number, you could fake it as a time (e.g., convert to a timestamp assuming some epoch and increment).

    The smallest possible interval is one second. No other routing scheme is supported, although this feature was developed with considerations that it could be extended/improved to other schemes.

  • The underlying collections form a contiguous sequence without gaps. This will not be suitable when there are large gaps in the underlying data, as Solr will insist that there be a collection for each increment. This is due in part on Solr calculating the end time of each interval collection based on the timestamp of the next collection, since it is otherwise not stored in any way.
  • Avoid sending updates to the oldest collection if you have also configured that old collections should be automatically deleted. It could lead to exceptions bubbling back to the indexing client.