Comet_maths Random Generator Module

Comet_maths Random Generator Module#

The purpose of the generate_sample module in comet_maths is to generate Monte Carlo (MC) samples. This mainly serves the needs of the MC uncertainty propagation module in punpy. Typically, one needs to generate a sample where each of the elements is randomly drawn from a Probability Density Function (PDF).

The most straigthforward case is when a sample has to be generated from a Gaussian PDF for a single mean and standard deviation. Within this module, the notation is as for the punpy module. i.e. using x for the mean (which is the input quantity in punpy) and u_x for the standard deviation (uncertainty in punpy). The generate_sample() function is used to generate the MC sample:

import comet_maths as cm
sample=cm.generate_sample(100,5,1)

The above example will generate a sample with 100 draws from a Gaussian distribution with 5 as mean and 1 as standard deviation. Other PDF are also available (see section below).

Often, a MC sample needs to be generated for more than a single input quantity. In this case, the generate_sample() function can still be used, where now x and u_x can be provided as a numpy arrays (with one or more dimensions), rather than a single number. However, when providing arrays, it is important to also provide the corr_x keyword to indicate how the different elements of the MC sample should be correlated:

import comet_maths as cm

x = np.array([330.,350.,370.,390.,410.])
u_x = np.array([10.1,12.3,14.7,16.2,18.4])

sample=cm.generate_sample(100,x,u_x,corr_x="rand")

We now get a sample with shape (100,5). There are thus 100 draws of the array with length 5. The corr_x here was set to “rand”, standing for a random correlation (i.e. the PDF’s of the different elements of the array are uncorrelated). It is also possible to set this to “syst”, which stands for a systematic correlation (i.e. the PDF’s of the different elements are fully correlated). Finally it is also possible to provide a correlation matrix for how the different elements of the array should be correlated. For these correlated options the MC sample is drawn from a joint PDF. Internally, comet_maths always generates independent random normally distributed samples first and then correlates them where necessary using the Cholesky decomposition method (see Random Generator Algorithm Theoretical Basis).

There are a number of other keywords for generate_sample(). The i keyword allows to generate a MC sample for a subset of the input data (i.e. x[i] and u_x[i] instead of x and u_x). The dtype keyword can be used to provide a numpy dtype for the output MC sample. The pdf_shape and pdf_params keywords will be detailed in the section below. When the comp_list keyword is set to True, a list of u_x and corr_x can be provided. Each of the elements of the lists will add variance to the resulting MC sample.

There are also a number of other functions available from the generate_sample module in comet_maths, though generally it is recommended to just use generate_sample(). generate_sample_random(), generate_sample_systematic() and generate_sample_correlated() simply do the same as the above but with slightly different inputs (e.g. no corr_x keyword for random and systematic). There is generate_sample_same() function that trivially generates a sample where u_x is zero and thus consists of repeats of the provided x array. Then, there is the generate_sample_cov() function where a covariance matrix is provided instead of the u_x and corr_x (which contain the same information).

Another useful function is correlate_sample_corr(). This function can be used to add correlation to a previously generated sample. This can particularly be useful when multiple MC samples were generated (e.g. for different variables, optionally each with their own internal correlation) and these different samples need to be correlated (e.g. because the different variables themselves are correlated). Note that by default the provided samples will be modified, though there is a keyword that allows to maintain the samples unchanged (at the cost of higher memory usage).

Finally, there is the generate_error_sample() function, which gives the differences between the sample generated by generate_sample() and x itself.

Different Probability Density Functions#

The standard probability density function in comet_maths is a Gaussian distribution. This means the generated MC samples will follow a gaussian distribution with the input quantity values as mean and uncertainties as standard deviation. However other probabilty density functions are also possible. Currently there are two additional options (with more to follow in the future).

The first alternative is a truncated Gaussian distribution. This distribution is just like the Gaussian one, except that there are no values outside a given minimum or maximum value. A typical use case of this distribution is when a certain input quantity can never be negative. In such a case the uncertainty propagation could be done like this:

sample=cm.generate_sample(100,x,u_x,corr_x="rand",pdf_shape="truncated_gaussian", pdf_params={"min":0.})

When the alternative probability density functions require additional parameters, these can be passed in the optional pdf_params dictionary. For the truncated Gaussian example, this dictionary can contain a value for “min” and “max” for the minimum and maximum allowed values respectively.

The second alternative is a tophat distribution. In this case the MC sample will have a uniform probabilty distribution from the value of the input quantity minus its uncertainty to the value of the input quantity plus its uncertainty. We note that for these modified probability density functions, the standard deviation of the MC sample is not the same as the uncertainty anymore.