Monte Carlo Simulations | Apache Solr Reference Guide 7.5

Monte Carlo simulations are commonly used to model the behavior of stochastic systems. This section describes how to perform both uncorrelated and correlated Monte Carlo simulations using the sampling capabilities of the probability distribution framework.

Uncorrelated Simulations

Uncorrelated Monte Carlo simulations model stochastic systems with the assumption that the underlying random variables move independently of each other. A simple example of a Monte Carlo simulation using two independently changing random variables is described below.

In this example a Monte Carlo simulation is used to determine the probability that a simple hinge assembly will fall within a required length specification.

The hinge has two components A and B. The combined length of the two components must be less then 5 centimeters to fall within specification.

A random sampling of lengths for component A has shown that its length conforms to a normal distribution with a mean of 2.2 centimeters and a standard deviation of .0195 centimeters.

A random sampling of lengths for component B has shown that its length conforms to a normal distribution with a mean of 2.71 centimeters and a standard deviation of .0198 centimeters.

let(componentA=normalDistribution(2.2, .0195),  (1)
    componentB=normalDistribution(2.71, .0198),  (2)
    simresults=monteCarlo(sampleA=sample(componentA),  (3)
                          sampleB=sample(componentB),
                          add(sampleA, sampleB),  (4)
                          100000),  (5)
    simmodel=empiricalDistribution(simresults),  (6)
    prob=cumulativeProbability(simmodel,  5))  (7)

The Monte Carlo simulation below performs the following steps:

1	A normal distribution with a mean of 2.2 and a standard deviation of .0195 is created to model the length of `componentA`.
2	A normal distribution with a mean of 2.71 and a standard deviation of .0198 is created to model the length of `componentB`.
3	The `monteCarlo` function samples from the `componentA` and `componentB` distributions and sets the values to variables `sampleA` and `sampleB`.
4	It then calls the `add(sampleA, sampleB)`* function to find the combined lengths of the samples.
5	The `monteCarlo` function runs a set number of times, 100000, and collects the results in an array. Each time the function is called new samples are drawn from the `componentA` and `componentB` distributions. On each run, the `add` function adds the two samples to calculate the combined length. The result of each run is collected in an array and assigned to the `simresults` variable.
6	An `empiricalDistribution` function is then created from the `simresults` array to model the distribution of the simulation results.
7	Finally, the `cumulativeProbability` function is called on the `simmodel` to determine the cumulative probability that the combined length of the components is 5 or less.

Based on the simulation there is .9994371944629039 probability that the combined length of a component pair will be 5 or less:

{
  "result-set": {
    "docs": [
      {
        "prob": 0.9994371944629039
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 660
      }
    ]
  }
}

Correlated Simulations

The simulation above assumes that the lengths of componentA and componentB vary independently. What would happen to the probability model if there was a correlation between the lengths of componentA and componentB?

In the example below a database containing assembled pairs of components is used to determine if there is a correlation between the lengths of the components, and how the correlation effects the model.

Before performing a simulation of the effects of correlation on the probability model its useful to understand what the correlation is between the lengths of componentA and componentB.

let(a=random(collection5, q="*:*", rows="5000", fl="componentA_d, componentB_d"), (1)
    b=col(a, componentA_d)), (2)
    c=col(a, componentB_d)),
    d=corr(b, c))  (3)

1	In the example, 5000 random samples are selected from a collection of assembled hinges. Each sample contains lengths of the components in the fields `componentA_d` and `componentB_d`.
2	Both fields are then vectorized. The componentA_d vector is stored in variable `b` and the componentB_d variable is stored in variable `c`.
3	Then the correlation of the two vectors is calculated using the `corr` function.

Note from the result that the outcome from corr is 0.9996931313216989. This means that componentA_d and *componentB_d are almost perfectly correlated.

{
  "result-set": {
    "docs": [
      {
        "d": 0.9996931313216989
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 309
      }
    ]
  }
}

Correlation Effects on the Probability Model

The example below explores how to use a multivariate normal distribution function to model how correlation effects the probability of hinge defects.

In this example 5000 random samples are selected from a collection containing length data for assembled hinges. Each sample contains the fields componentA_d and componentB_d.

Both fields are then vectorized. The componentA_d vector is stored in variable b and the componentB_d variable is stored in variable c.

An array is created that contains the means of the two vectorized fields.

Then both vectors are added to a matrix which is transposed. This creates an observation matrix where each row contains one observation of componentA_d and componentB_d. A covariance matrix is then created from the columns of the observation matrix with the cov function. The covariance matrix describes the covariance between componentA_d and componentB_d.

The multivariateNormalDistribution function is then called with the array of means for the two fields and the covariance matrix. The model for the multivariate normal distribution is stored in variable g.

The monteCarlo function then calls the function add(sample(g)) 50000 times and collections the results in a vector. Each time the function is called a single sample is drawn from the multivariate normal distribution. Each sample is a vector containing one componentA and componentB pair. The add function adds the values in the vector to calculate the length of the pair. Over the long term the samples drawn from the multivariate normal distribution will conform to the covariance matrix used to construct it.

Just as in the non-correlated example an empirical distribution is used to model probabilities of the simulation vector and the cumulativeProbability function is used to compute the cumulative probability that the combined component length will be 5 centimeters or less.

Notice that the probability of a hinge meeting specification has dropped to 0.9889517439980468. This is because the strong correlation between the lengths of components means that their lengths rise together causing more hinges to fall out of the 5 centimeter specification.

let(a=random(hinges, q="*:*", rows="5000", fl="componentA_d, componentB_d"),
    b=col(a, componentA_d),
    c=col(a, componentB_d),
    cor=corr(b,c),
    d=array(mean(b), mean(c)),
    e=transpose(matrix(b, c)),
    f=cov(e),
    g=multiVariateNormalDistribution(d, f),
    h=monteCarlo(add(sample(g)), 50000),
    i=empiricalDistribution(h),
    j=cumulativeProbability(i, 5))

When this expression is sent to the /stream handler it responds with:

{
  "result-set": {
    "docs": [
      {
        "j": 0.9889517439980468
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 599
      }
    ]
  }
}

Probability Distributions Time Series