Transforming Data

Streaming expressions provides a powerful set of functions for transforming result sets. This section of the user guide provides an overview of useful transformations applied to result sets.

Selecting and Adding Fields

The select function wraps another streaming expression can perform the following operations on each tuple in the stream:

  • Select a subset of fields

  • Map fields to new names

  • Compute new fields

Below is an example showing the select function wrapping a search function and mapping fields to new field names. The recNum function is a math expression which simply returns the current record number of the tuple. The select expression can call any math expression to compute new values.

select1

Below is an example using the div function to compute a new field from two existing fields:

select math

Filtering Tuples

The having function can be used to filter tuples in the stream based on boolean logic.

In the example below the having function is filtering the output of the facet function to only emit tuples that have count(*) greater than 20404.

having

Paging

The record number, added with the recNum function, can be filtered on to support paging.

In the example below the and function with nested lt and gt functions are used to select records within a specific record number range:

search page

Handling Nulls

The notNull and isNull functions can be used to either replace null values with different values, or to filter out tuples with null values.

The example below is using the isNull function inside of select function to replace null values with -1. The if function takes 3 parameters. The first is a boolean expression, in this case isNull. The if function returns the second parameter if the boolean function returns true, and the third parameter if it returns false. In this case isNull is always true because its checking for a field in the tuples that is not included in the result set.

select2

notNull and isNull can also be used with the having function to filter out tuples with null values.

The example below emits all the documents because it is evaluating isNull for a field that is not in the result set, which always returns true.

having2

The example below emits zero documents because it is evaluating notNull for a field that is not in the result set, which always returns false.

having3

Regex Matching and Filtering

The matches function can be used inside of a having function to test if a field in the record matches a specific regular expression. This allows for sophisticated regex matching over search results.

The example below uses the matches function to return all records where the complaint_type_s field ends with Commercial.

search matches

Sorting

The sort and top function can be used to resort a result set in memory. The sort function sorts and returns the entire result set based on the sort criteria. The top function can be used to return the top N values in a result set based on the sort criteria.

search resort

Rollups

The rollup and hashRollup functions can be used to perform aggregations over result sets. This is different then the facet, facet2D and timeseries aggregation functions which push the aggregations into the search engine using the JSON facet API.

The rollup function performs map-reduce style rollups, which requires the result stream be sorted by the the grouping fields. This allows for aggregations over very high cardinality fields. The hashRollup function performs rollups keeping all buckets in an in-memory hashmap. This requires enough memory to store all the distinct group by fields in memory, but does not require that the underlying stream be sorted.

The example below shows a visualization of the top 5 complaint types from a random sample of the nyc311 complaint database. The top function is used to select the top 5 complaint types based on the count(*) field output by the hashRollup.

hashRollup