Copying Fields

You might want to interpret some document fields in more than one way. Solr has a mechanism for making copies of fields so that you can apply several distinct field types to a single piece of incoming information.

The name of the field you want to copy is the source, and the name of the copy is the destination. In schema.xml, it’s very simple to make copies of fields:

<copyField source="cat" dest="text" maxChars="30000" />

In this example, we want Solr to copy the cat field to a field named text. Fields are copied before analysis is done, meaning you can have two fields with identical original content, but which use different analysis chains and are stored in the index differently.

In the example above, if the text destination field has data of its own in the input documents, the contents of the cat field will be added as additional values – just as if all of the values had originally been specified by the client. Remember to configure your fields as multivalued="true" if they will ultimately get multiple values (either from a multivalued source or from multiple copyField directives).

A common usage for this functionality is to create a single "search" field that will serve as the default query field when users or clients do not specify a field to query. For example, title, author, keywords, and body may all be fields that should be searched by default, with copy field rules for each field to copy to a catchall field (for example, it could be named anything). Later you can set a rule in solrconfig.xml to search the catchall field by default. One caveat to this is your index will grow when using copy fields. However, whether this becomes problematic for you and the final size will depend on the number of fields being copied, the number of destination fields being copied to, the analysis in use, and the available disk space.

The maxChars parameter, an int parameter, establishes an upper limit for the number of characters to be copied from the source value when constructing the value added to the destination field. This limit is useful for situations in which you want to copy some data from the source field, but also control the size of index files.

Both the source and the destination of copyField can contain either leading or trailing asterisks, which will match anything. For example, the following line will copy the contents of all incoming fields that match the wildcard pattern *_t to the text field.:

<copyField source="*_t" dest="text" maxChars="25000" />

The copyField command can use a wildcard (*) character in the dest parameter only if the source parameter contains one as well. copyField uses the matching glob from the source field for the dest field name into which the source content is copied.

Copying is done at the stream source level and no copy feeds into another copy. This means that copy fields cannot be chained i.e., you cannot copy from here to there and then from there to elsewhere. However, the same source field can be copied to multiple destination fields:

<copyField source="here" dest="there"/>
<copyField source="here" dest="elsewhere"/>