# FOSM - a brief overview (with equations!)

Throughout the previous tutorial notebooks we have explored how parameter estimation affects posterior parameter and forecast uncertainty. This notebook goes through some of the detail of how these uncertainties are calculated by PEST++ and `pyemu`

.

FOSM stands for “First Order, Second Moment”, which is the mathematical description of what is being described. In PEST documentation (and other GMDSI tutorials), it is sometimes referred to as “linear analysis”. See also page 460 in Anderson et al. (2015).

Anderson, M. P., Woessner, W. W., & Hunt, R. J. (2015). Applied Groundwater Modeling: Simulation of Flow and Advective Transport. InApplied Groundwater Modeling(2nd ed.). Elsevier. https://linkinghub.elsevier.com/retrieve/pii/B9780080916385000018

Pages 461-465 of Anderson et al. use the PREDUNC equation of PEST to discuss an applied view of FOSM, what goes into it, and what it means in practice. Here we will look more closely at these. The objective is to get a better feel for what is going on under the hood in linear uncertainty analyses.

Side Note: in Part2 of this series of tutorial notebooks we demonstrate a complete FOSM and Data-worth analysis workflow using`pyemu`

and PEST++. The current notebook merely aims to provide a very high level introduction to some of the concepts.

FOSM provides approximate mathematical characterisation of prior predictive probability distributions, and of posterior parameter and predictive probability distributions. It has other uses as well. It can be used to demonstrate how the history-matching process bestows worth on data. It can also be deployed to track the flow of information from field measurements of system state to parameters, and ultimately from parameters to model predictions.

It does all of these things by implementing Bayes equation under the following assumptions:

- The prior probability distribution of parameters is multiGaussian.
- “Measurement noise” (including structural noise) is also characterized by a Gaussian distribution.
- The relationships between model outputs that correspond to measurements of system state and parameters employed by a model can be approximated by the action of a matrix on a vector.
- Model outputs that correspond to predictions of management interest can be calculated using another matrix that acts on model parameters.

Ideally linear analysis is undertaken after a model has been calibrated. However, if a model is truly linear (which it never is), the outcomes of FOSM are independent of parameter values and can therefore, in theory, be applied with the user-supplied prior mean parameter values.

If calibration has been undertaken, then minimum-error variance (i.e. calibrated) parameter values should be assigned to parameters as their initial parameters in the “parameter data” section of the PEST control file on which linear analysis is based. The Jacobian matrix should be calculated using these parameters. And, if the uncertainty of a prediction is going to be examined, then the model output that pertains to this prediction must be included as an “observation” in the PEST input dataset; sensitivities of this model output to model parameters will therefore appear in the Jacobian matrix.

FOSM tasks may include:

- approximate parameter and predictive uncertainty quantification;
- data worth analysis;
- identifying parameters that are most salient for forecasts of interest,
- identifying parameter contributions to predictive uncertainty and
- assessing parameter identifiability.

Outcomes of these analyses can provide easily understood insights into what history-matching can and cannot achieve with the available information. These insights can be used to streamline the data assimlation process and guide further site characterisation studies. Of particular interest is data worth analysis. The worth of data is measured by their ability to reduce the uncertainties of model predictions that we care about. Because the equations on which FOSM relies do not require that an observation value already be known, data worth can be assessed on as-of-yet ungathered data.

But we are getting ahead of ourselves, let’s take this back to basics.

## The famous Bayes Rule in a nutshell:

We update our knowledge by comparing what we know/believe with measured/observed data. What we know now, is a function of what we knew before, compared to what we learned from measured data.

\[\underbrace{P(\boldsymbol{\theta}|\textbf{d})}_{\substack{\text{what we} \\ \text{know now}}} \propto \underbrace{\mathcal{L}(\boldsymbol{\theta} | \textbf{d})}_{\substack{\text{what we} \\ \text{learned}}} \underbrace{P(\boldsymbol{\theta})}_{\substack{\text{what we} \\ \text{knew}}}\]We can also think of this graphically, as taken from Anderson et al. (2015) in slightly different notation but the same equation and concept:

The problem is, for real-world problems, the likelihood function (“what we learned”) is high-dimensional and non-parameteric, requiring non-linear (typically Monte Carlo) integration for rigorous Bayes. Unfortunatley, non-linear methods are computationaly expensive and ineficient as we will see in a subsequent notebook.

But, we can make some assumptions and greatly reduce computational burden. This is why we often suggest using these linear methods first before burning the silicon on the non-linear ones like Monte Carlo.

## How do we reduce the computational burden?

By assuming that:

### 1. There is an approximate linear relation between parameters and observations:

### $\mathbf{J} \approx \text{constant}$, $\frac{\partial\text{obs}}{\partial\text{par}} \approx \text{constant}$

### 2. The parameter and forecast prior and posterior distributions are approximately Gaussian:

### $ P(\boldsymbol{\theta}|\mathbf{d}) \approx \mathcal{N}(\overline{\boldsymbol{\mu}}_{\boldsymbol{\theta}},\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}})$

Armed with these two assumptions, from Bayes equations, one can derive the Schur complement for conditional uncertainty propogation:

### $\underbrace{\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{know now}}} = \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{knew}}} - \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T\left[\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T + \boldsymbol{\Sigma}_{\boldsymbol{\epsilon}}\right]^{-1}\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\text{what we learned}}$

### Highlights:

- There are no parameter values or observation values in these equations!
- “us + data” = $\overline{\Sigma}
*{\theta}$; “us” = $\Sigma*{\theta}$. This accounts for information from both data and expert knowledge. - The ‘-‘ on the right-hand-side shows that we are (hopefully) collapsing the probability manifold in parameter space by “learning” from the data. Or put another way, we are subtracting from the uncertainty we started with (we started with the Prior uncertainty)
- Uncertainty in our measurements of the world is encapsulated in $\Sigma_{\epsilon}$. If the “observations” are highly uncertain, then parameter “learning” decreases because $\Sigma_{\epsilon}$ is in the denominator. Put another way, if our measured data are made (assumed) to be accurate and precise, then uncertainty associated with the parameters that are constrained by these measured data is reduced - we “learn” more.
- What quantities are needed? $\bf{J}$, $\boldsymbol{\Sigma}
*{\theta}$, and $\boldsymbol{\Sigma}*{\epsilon}$ - The diagonal of $\Sigma_{\theta}$ and $\overline{\Sigma}_{\theta}$ are the Prior and Posterior uncertainty (variance) of each adjustable parameter

# But what about forecasts?

We can use the same assumptions:

- prior forecast uncertainty (variance): $\sigma^2
*{s} = \mathbf{y}^T\boldsymbol{\Sigma}*{\boldsymbol{\theta}}\mathbf{y}$ - posterior forecast uncertainty (variance): $\overline{\sigma}^2
*{s} = \mathbf{y}^T\overline{\boldsymbol{\Sigma}}*{\boldsymbol{\theta}}\mathbf{y}$

### Highlights:

- Again, no parameter values or forecast values!
- What’s needed? $\bf{y}$, which is the
**sensitivity of a given forecast**to each adjustable parameter. Each forecast will have its own $\bf{y}$. - How do I get $\bf{y}$? the easiest way is to include your forecast(s) as an observation in the control file - then we get the $\bf{y}$’s for free during the parameter estimation process.

# Mechanics of calculating FOSM parameter and forecast uncertainty estimates

**in the PEST world:**

In the origingal PEST (i.e., not PEST++) documentation, FOSM is referred to as linear analysis. Implementing the various linear analyses relies a suite of utility software and a series of user-input-heavy steps, as illustrated in the figure below.

**in PEST++**:

In the PEST++ world, life is much easier. By default, PEST++GLM implements FOSM on-the-fly (it can be deactivated if the user desires) and records parameter and forecast uncertainties throughout the parameter estimation process.

Let’s take a closer look and get a feel what is going on.

# FOSM with PEST++ Demo

In the tutorial directory there is a folder containing the outcomes a PEST++GLM parameter estimation run. (These are based on the model and PEST setup constructed in the “*part1_freyberg_pilotpoints*” notebooks.) In the following section we will access several of these files using `pyemu`

. It is assumed that the reader is familiar with the basics of `pyemu`

.

Parameter estimation has already been undertaken with PEST++GLM. So we already have at our disposal a `jacobian matrix`

, and the parameter and forecast uncertainty files written by PEST++GLM.

```
import sys
import os
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt;
import psutil
import shutil
sys.path.insert(0,os.path.join("..", "..", "dependencies"))
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd
plt.rcParams['font.size'] = 10
pyemu.plot_utils.font =10
```

The folder with model and pest files:

```
working_dir = "master_pp"
```

The PEST control file name:

```
pst_name = "freyberg_pp.pst"
```

Load the PEST control file:

```
pst = pyemu.Pst(os.path.join(working_dir, pst_name))
```

### Let’s look at the parameter uncertainty summary written by pestpp:

PEST++GLM records a parameter uncertainty file named *casename.par.usum.csv*. It records the prior and posterior means, bounds and standard deviations.

```
df = pd.read_csv(os.path.join(working_dir,pst_name.replace(".pst",".par.usum.csv")),index_col=0)
df.tail()
```

prior_mean | prior_stdev | prior_lower_bound | prior_upper_bound | post_mean | post_stdev | post_lower_bound | post_upper_bound | |
---|---|---|---|---|---|---|---|---|

name | ||||||||

rch_i:17_j:2_zone:1.0 | 0.0 | 0.150515 | -0.30103 | 0.30103 | 0.30103 | 0.149751 | 0.001528 | 0.600532 |

rch_i:17_j:12_zone:1.0 | 0.0 | 0.150515 | -0.30103 | 0.30103 | -0.30103 | 0.150025 | -0.601081 | -0.000979 |

rch_i:37_j:17_zone:1.0 | 0.0 | 0.150515 | -0.30103 | 0.30103 | -0.30103 | 0.150258 | -0.601546 | -0.000514 |

rch_i:27_j:12_zone:1.0 | 0.0 | 0.150515 | -0.30103 | 0.30103 | -0.30103 | 0.150081 | -0.601192 | -0.000868 |

rch_i:12_j:17_zone:1.0 | 0.0 | 0.150515 | -0.30103 | 0.30103 | -0.30103 | 0.149847 | -0.600725 | -0.001335 |

We can visualize this with probability distributions. In the plot below, prior parameter distributions are shown by the dashed grey lines. Posterior parameter distributions are the blue shaded areas. Each plot shows distributions for parameters in the same group:

```
par = pst.parameter_data
df_paru = pd.read_csv(os.path.join(working_dir,pst_name.replace(".pst",".par.usum.csv")),index_col=0)
fig, axes=plt.subplots(1,len(pst.adj_par_groups),figsize=(15,5))
for pargp, ax in zip(pst.adj_par_groups, axes):
hk_pars = [p for p in pst.par_names if p.startswith("hk")]
pars = par.loc[par.pargp==pargp].parnme.values
df_par = df_paru.loc[pars,:]
ax = pyemu.plot_utils.plot_summary_distributions(df_par,label_post=False, ax=ax)
mn = np.log10(pst.parameter_data.loc[pars[0].lower(),"parlbnd"])
mx = np.log10(pst.parameter_data.loc[pars[0].lower(),"parubnd"])
ax.set_title(pargp)
```

### There is a similar file for forecasts:

*casename.pred.usum.csv*

```
axes = pyemu.plot_utils.plot_summary_distributions(os.path.join(working_dir,pst_name.replace(".pst",".pred.usum.csv")),subplots=True)
```

### Where do the prior parameter distributions come from?

Prior parameter distributions can come from one of two sources.

- If no other information is provided, PEST++GLM assumes that all adjustable parameters are statistically independent. In this case, by default, the prior standard deviation of each parameter is calculated as a quarter of the difference between its upper and lower bounds in the PEST control file.(This is the case here)
- Alternatively, the name of a prior parameter covariance matrix file can be provided to the
`parcov()`

control variable.

### Where do the prior forecast distributions come from?

At the first iteration of the parameter estimation process, PEST++GLM calculates sensitivities based on initial parameter values. These are used to determine the prior parameter and forecast uncertianty.

### Why are are the posterior distributions different than the priors?

Recall Bayes’ Rule? By comparing model outputs to measured data we have “learnt” information about model parameters, thus “updating our prior” and reducing parameter (and forecast) uncertainty.

# FOSM with pyEMU

Now, `pyemu`

does the same calculations, but also allows you to do other, more exciting things!

We need three ingredients for FOSM:

- parameter covariance matrix
- observation noise covariance matrix
- jacobian matrix

The `Schur`

object is one of the primary object for FOSM in pyEMU and the only one we will talk about in this tutorial.

```
sc = pyemu.Schur(jco=os.path.join(working_dir,pst_name.replace(".pst",".jcb")),verbose=False)
```

Now that seemed too easy, right? Well, underhood the `Schur`

object found the control file (“freyberg_pp.pst”) and used it to build the prior parameter covariance matrix, from the parameter bounds and the observation noise covariance matrix from the observation weights. These are the `Schur.parcov`

and `Schur.obscov`

attributes.

The `Schur`

object also found the “++forecasts()” optional pestpp argument in the control, found the associated rows in the Jacobian matrix file and extracted those rows to serve as forecast sensitivity vectors:

```
sc.pst.pestpp_options['forecasts']
```

```
'headwater:4383.5,tailwater:4383.5,trgw-0-9-1:4383.5,part_time'
```

### The Jacobian Matrix and Forecast Sensitivity Vectors

Recall that a Jacobian matrix looks at the changes in observations as a parameter is changed. Therefore the Jacobian matrix has parameters in the columns and observations in the rows. The bulk of the matrix is made up of the difference in observations between a base run and a run where the parameter at the column head was perturbed (typically 1% from the base run value - controlled by the “parameter groups” info). Now we’ll plot out the Jacobian matrix as a `DataFrame`

:

```
sc.jco.to_dataframe().loc[sc.pst.nnz_obs_names,:].head()
```

strinf | wel0 | wel2 | wel4 | wel5 | wel1 | wel3 | hk_i:12_j:2_zone:1.0 | hk_i:32_j:12_zone:1.0 | hk_i:2_j:17_zone:1.0 | ... | rch_i:2_j:7_zone:1.0 | rch_i:2_j:17_zone:1.0 | rch_i:7_j:17_zone:1.0 | rch_i:22_j:12_zone:1.0 | rch_i:22_j:2_zone:1.0 | rch_i:17_j:2_zone:1.0 | rch_i:17_j:12_zone:1.0 | rch_i:37_j:17_zone:1.0 | rch_i:27_j:12_zone:1.0 | rch_i:12_j:17_zone:1.0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

gage-1:3652.5 | 2716.948265 | -4.548055 | -4.719683 | -5.232870 | -4.498392 | -4.689892 | -5.017089 | 3.445071 | 31.416424 | 15.240922 | ... | 190.031485 | 115.306638 | 118.042265 | 119.114189 | 223.313201 | 231.708826 | 138.388709 | 63.774836 | 113.687036 | 121.292411 |

gage-1:3683.5 | 2774.639984 | -181.311316 | -166.978297 | -10.342791 | -87.590161 | -110.276389 | -32.921654 | 4.402686 | 32.094217 | 15.121772 | ... | 221.004388 | 134.915061 | 136.592731 | 130.330209 | 238.872431 | 252.221559 | 153.398371 | 69.859064 | 123.139058 | 138.266282 |

gage-1:3712.5 | 2780.020916 | -276.473740 | -245.350759 | -20.599377 | -151.432872 | -185.385447 | -67.049198 | 4.654596 | 36.865047 | 18.845729 | ... | 229.133526 | 153.496081 | 156.089054 | 145.632341 | 242.263018 | 256.949881 | 169.704385 | 83.319417 | 138.394319 | 158.253214 |

gage-1:3743.5 | 2783.890941 | -335.620872 | -291.893149 | -35.153415 | -197.641178 | -238.817336 | -101.203897 | 4.971822 | 46.180032 | 24.684773 | ... | 242.111479 | 176.567497 | 180.120737 | 164.500045 | 247.881326 | 264.289167 | 190.624789 | 98.052362 | 156.877133 | 182.798667 |

gage-1:3773.5 | 2785.355604 | -374.821849 | -322.374110 | -50.609492 | -228.410582 | -276.715310 | -129.809297 | 5.299181 | 55.300079 | 29.438466 | ... | 256.583441 | 194.019730 | 198.089350 | 178.866029 | 255.292138 | 272.878837 | 207.569112 | 107.102969 | 170.503931 | 201.037663 |

5 rows × 65 columns

This reports changes in observations to a change in a parameter. We can report how forecasts of interests change as the parameter is perturbed. Note `pyemu`

extracted the forecast rows from the Jacobian on instantiation:

```
sc.forecasts.to_dataframe()
```

headwater:4383.5 | tailwater:4383.5 | trgw-0-9-1:4383.5 | part_time | |
---|---|---|---|---|

strinf | 4.525783 | 15.049001 | 0.019712 | -31.396749 |

wel0 | 14.641942 | 3.898550 | -0.100690 | -0.731877 |

wel2 | 11.608075 | 10.579199 | -0.095968 | -21.285655 |

wel4 | 18.162402 | 33.111962 | -0.175127 | -72.519641 |

wel5 | 8.105663 | 17.741830 | -0.097510 | -36.384135 |

... | ... | ... | ... | ... |

rch_i:17_j:2_zone:1.0 | -101.045614 | -61.271020 | 0.718966 | 3973.941965 |

rch_i:17_j:12_zone:1.0 | -11.717836 | -7.108638 | 0.056490 | 52.716389 |

rch_i:37_j:17_zone:1.0 | -0.778950 | -1.143194 | 0.004321 | 17.939060 |

rch_i:27_j:12_zone:1.0 | -7.106006 | -8.015407 | 0.041517 | 183.053088 |

rch_i:12_j:17_zone:1.0 | -4.895285 | -1.892905 | 0.020423 | -33.987577 |

65 rows × 4 columns

Each of these columns in a $\bf{y}$ vector used in the FOSM calculations…that’s it!

### The prior parameter covariance matrix - $\boldsymbol{\Sigma}_{\theta}$

Because we have inherent uncertainty in the parameters, the forecasts also have uncertainty. Here’s what we have defined for parameter uncertainty - the Prior. As discussed above, it was constructed on-the-fly from the parameter bounds in the control file:

```
sc.parcov.to_dataframe()
```

strinf | wel0 | wel2 | wel4 | wel5 | wel1 | wel3 | hk_i:12_j:2_zone:1.0 | hk_i:32_j:12_zone:1.0 | hk_i:2_j:17_zone:1.0 | ... | rch_i:2_j:7_zone:1.0 | rch_i:2_j:17_zone:1.0 | rch_i:7_j:17_zone:1.0 | rch_i:22_j:12_zone:1.0 | rch_i:22_j:2_zone:1.0 | rch_i:17_j:2_zone:1.0 | rch_i:17_j:12_zone:1.0 | rch_i:37_j:17_zone:1.0 | rch_i:27_j:12_zone:1.0 | rch_i:12_j:17_zone:1.0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

strinf | 0.25 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |

wel0 | 0.00 | 0.238691 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |

wel2 | 0.00 | 0.000000 | 0.238691 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |

wel4 | 0.00 | 0.000000 | 0.000000 | 0.238691 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |

wel5 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.238691 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |

... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |

rch_i:17_j:2_zone:1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.022655 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |

rch_i:17_j:12_zone:1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.022655 | 0.000000 | 0.000000 | 0.000000 |

rch_i:37_j:17_zone:1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.022655 | 0.000000 | 0.000000 |

rch_i:27_j:12_zone:1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.022655 | 0.000000 |

rch_i:12_j:17_zone:1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.022655 |

65 rows × 65 columns

Page 463-464 in Anderson et al. (2015) spends some time on what is shown above.

For our purposes, a diagonal Prior - numbers only along the diagonal - shows that we expect the uncertainty for each parameter to only results from itself - there is no covariance with other parameters. The numbers themselves reflect “the innate parameter variability”, and is input into the maths as a standard deviation around the parameter value. This is called the “C(p) matrix of innate parameter variability” in PEST parlance.

IMPORTANT POINT: Again, how did PEST++ and pyEMU get these standard deviations shown in the diagonal? From theparameter boundsthat were specified for each parameter in the PEST control file.

### The matrix of observation noise - $C{\epsilon}$

Forecast uncertainty has to take into account the noise/uncertainty in the observations. Similar to the parameter Prior - the $\Sigma_{\theta}$ matrix -, it is a covariance matrix of measurement error associated with the observations. This is the same as $\Sigma_{\epsilon}$ that we discussed above. For our Fryberg problem, sthe $C{\epsilon}$ matrix would look like:

```
sc.obscov.to_dataframe().loc[sc.pst.nnz_obs_names,sc.pst.nnz_obs_names].head()
```

gage-1:3652.5 | gage-1:3683.5 | gage-1:3712.5 | gage-1:3743.5 | gage-1:3773.5 | gage-1:3804.5 | gage-1:3834.5 | gage-1:3865.5 | gage-1:3896.5 | gage-1:3926.5 | ... | trgw-0-3-8:3743.5 | trgw-0-3-8:3773.5 | trgw-0-3-8:3804.5 | trgw-0-3-8:3834.5 | trgw-0-3-8:3865.5 | trgw-0-3-8:3896.5 | trgw-0-3-8:3926.5 | trgw-0-3-8:3957.5 | trgw-0-3-8:3987.5 | trgw-0-3-8:4018.5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

gage-1:3652.5 | 40000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

gage-1:3683.5 | 0.0 | 40000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

gage-1:3712.5 | 0.0 | 0.0 | 40000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

gage-1:3743.5 | 0.0 | 0.0 | 0.0 | 40000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

gage-1:3773.5 | 0.0 | 0.0 | 0.0 | 0.0 | 40000.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |

5 rows × 37 columns

IMPORTANT POINT: How did PEST++ and pyEMU get these standard deviations shown in the diagonal? From theweightsthat were specified for each observation in the PEST control file.

IMPORTANT POINT: You can use FOSM in the “pre-calibration” state to design an objective function (e.g. weights) to maximize forecast uncertainty reduction.

IMPORTANT POINT: In PEST++, if a given observation has a larger-than-expected residual, the variance of said observation is reset to the variance implied by the residual. That is, the diagonal elements of $\Sigma_{\epsilon}$ are reset according to the residuals

## Posterior Parameter Uncertainty - ${\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}} $

Okay, enough emphasis. Here’s the point. When we apply FOSM using the matrices above, we can see how our uncertainty changes during calibration, first for parameters and then for forecasts.

Here, we are updating parameter covariance following notional calibration as represented by the Jacobian matrix and both prior parameter and observation noise covariance matrices.

In other words, given prior parameter uncertainty and the inherent noise in measurments, we calculate the expected parameter uncertainty **after** calibration. This assumes that *calibration achieves a fit comensurate with measurement noise, parameter linearity, etc*.

The posterior parameter covariance matrix is stored as a `pyemu.Cov`

object in the `sc.posterior_parameter`

attribute. The diagonal of this matrix contains the posterior **variance** for each parameter. The off-diagonals the parameter covariances.

```
sc.posterior_parameter.to_dataframe().head()
```

strinf | wel0 | wel2 | wel4 | wel5 | wel1 | wel3 | hk_i:12_j:2_zone:1.0 | hk_i:32_j:12_zone:1.0 | hk_i:2_j:17_zone:1.0 | ... | rch_i:2_j:7_zone:1.0 | rch_i:2_j:17_zone:1.0 | rch_i:7_j:17_zone:1.0 | rch_i:22_j:12_zone:1.0 | rch_i:22_j:2_zone:1.0 | rch_i:17_j:2_zone:1.0 | rch_i:17_j:12_zone:1.0 | rch_i:37_j:17_zone:1.0 | rch_i:27_j:12_zone:1.0 | rch_i:12_j:17_zone:1.0 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

strinf | 0.006971 | 0.004594 | 0.004228 | 0.000588 | 0.002349 | 0.003672 | 0.001565 | -0.001135 | -0.005610 | 0.000060 | ... | -0.001500 | -0.000722 | -0.000726 | -0.000746 | -0.001539 | -0.001735 | -0.000883 | -0.000382 | -0.000699 | -0.000737 |

wel0 | 0.004594 | 0.176378 | -0.052375 | -0.008618 | -0.037732 | -0.048709 | -0.022273 | 0.001038 | 0.007925 | -0.003280 | ... | 0.000872 | 0.000442 | 0.000427 | 0.000363 | 0.000935 | 0.000851 | 0.000491 | 0.000094 | 0.000315 | 0.000409 |

wel2 | 0.004228 | -0.052375 | 0.193076 | -0.010004 | -0.032230 | -0.039082 | -0.020505 | 0.000689 | 0.005073 | -0.001778 | ... | 0.000942 | 0.000499 | 0.000489 | 0.000407 | 0.000726 | 0.000772 | 0.000527 | 0.000167 | 0.000359 | 0.000477 |

wel4 | 0.000588 | -0.008618 | -0.010004 | 0.213220 | -0.014696 | -0.012287 | -0.026425 | -0.000075 | -0.017551 | -0.004137 | ... | 0.000140 | -0.000374 | -0.000394 | -0.000244 | 0.000513 | 0.000774 | -0.000210 | -0.000294 | -0.000281 | -0.000399 |

wel5 | 0.002349 | -0.037732 | -0.032230 | -0.014696 | 0.210984 | -0.032978 | -0.023670 | 0.000649 | -0.000570 | -0.003115 | ... | 0.000668 | 0.000037 | 0.000014 | 0.000024 | 0.000546 | 0.000647 | 0.000115 | -0.000110 | -0.000021 | -0.000002 |

5 rows × 65 columns

But…is calibration worth pursuing or not? Let’s explore what the notional calibration is expected to do for parameter uncertainty. We accomplish this by comparing prior and posterior parameter uncertainty. Using `.get_parameter_summary()`

makes this easy:

```
df = sc.get_parameter_summary()
df.head()
```

prior_var | post_var | percent_reduction | |
---|---|---|---|

strinf | 0.250000 | 0.006971 | 97.211534 |

wel0 | 0.238691 | 0.176378 | 26.106274 |

wel2 | 0.238691 | 0.193076 | 19.110462 |

wel4 | 0.238691 | 0.213220 | 10.671466 |

wel5 | 0.238691 | 0.210984 | 11.607882 |

We can plot that up:

```
df.percent_reduction.plot(kind="bar", figsize=(15,3));
```

### Do these results make sense? Why are some parameters unaffected by calibration?

As the name suggests, the `percent_reduction`

column shows the percentage decrease in uncertainty expected through calibration for each parameter.

From the plot above we can see that calibrating the model with available data definetly reduces uncertainty of some parameters. Some parameters are informed by observation data…however calibration does not affect all parameters equally. Available observation data does not contain information that affects these parameters. Calibration will not help us reduce their uncertainty.

## Forecast Uncertainty

So far we have seen that some parameter uncertainty will be reduced. Uncertainty for other parameters will not. That’s great and all, but what we really care about are our forecast uncertainties. Do the parameters that are informed by calibration affect the forecast of interest? And will calibrating reduce the uncertainty of these forecast?

Let’s examine the prior and posterior variance of our forecasts. Recall that they are recorded as observations in the `Pst`

control file and also listed in the pest++ `forecast`

control variable:

```
forecasts = sc.pst.forecast_names
forecasts
```

```
['headwater:4383.5', 'tailwater:4383.5', 'trgw-0-9-1:4383.5', 'part_time']
```

As before, `pyemu`

has already done much of the heavy-lifting. We can get a summary of the forecast prior and posterior variances with `.get_forecast_summary()`

:

```
df = sc.get_forecast_summary()
df
```

prior_var | post_var | percent_reduction | |
---|---|---|---|

headwater:4383.5 | 1.840786e+04 | 1.043837e+04 | 43.293944 |

tailwater:4383.5 | 1.854221e+04 | 7.894541e+03 | 57.423952 |

trgw-0-9-1:4383.5 | 1.106435e+01 | 4.617277e+00 | 58.268859 |

part_time | 8.931397e+07 | 7.014413e+07 | 21.463434 |

And we can make a cheeky little plot of that. As you can see, unsurprisingly some forecasts benefit more from calibration than others. So, depending on the foreacst of interest, calibration may or may not be worthwhile…

```
# get the forecast summary then make a bar chart of the percent_reduction column
fig = plt.figure()
ax = plt.subplot(111)
ax = df.percent_reduction.plot(kind='bar',ax=ax,grid=True)
ax.set_ylabel("percent uncertainy\nreduction from calibration")
ax.set_xlabel("forecast")
```

```
Text(0.5, 0, 'forecast')
```

## Parameter contribution to forecast uncertainty

Information flows from observations to parameters and then out to forecasts. Information contained in observation data constrains parameter uncertainty, which in turn constrains forecast uncertainty. For a given forecast, we can evaluate which parameter contributes the most to uncertainty. This is accomplished by assuming a parameter (or group of parameters) is perfectly known and then assessing forecast uncertainty under that assumption. Comparing uncertainty obtained in this manner, to the forecast uncertainty under the base assumption (in which no parameter is perfectly known), the contribution from that parameter (or parameter group) is obtained.

Now, this is a pretty big assumption - in practice a parameter is never perfectly known. Nevertheless, this metric can provide usefull insights into the flow of information from data to forecast uncertainty, which can help guide data assimilation design as well as future data collection efforts.

In `pyemu`

we can evaluate parameter contributions to forecast uncertainty with groups of parameters by type using `.get_par_group_contribution()`

:

```
par_contrib = sc.get_par_group_contribution()
par_contrib.head()
```

headwater:4383.5 | tailwater:4383.5 | trgw-0-9-1:4383.5 | part_time | |
---|---|---|---|---|

base | 10438.368644 | 7894.540701 | 4.617277 | 7.014413e+07 |

hk1 | 544.264823 | 191.279078 | 0.022420 | 1.243777e+06 |

rchpp | 8897.659624 | 7731.591543 | 4.578167 | 6.915516e+07 |

strinf | 10046.303913 | 7333.006237 | 4.490364 | 7.014090e+07 |

wel | 10280.959926 | 7838.238188 | 4.566155 | 6.994418e+07 |

We can see the relatve contribution by normalizing to the base case (e.g. in which no parameters/groups are perfectly known):

```
base = par_contrib.loc["base",:]
par_contrib = 100.0 * (base - par_contrib) / base
par_contrib.sort_index().head()
```

headwater:4383.5 | tailwater:4383.5 | trgw-0-9-1:4383.5 | part_time | |
---|---|---|---|---|

base | 0.000000 | 0.000000 | 0.000000 | 0.000000 |

hk1 | 94.785921 | 97.577071 | 99.514439 | 98.226827 |

rchpp | 14.760056 | 2.064074 | 0.847050 | 1.409910 |

strinf | 3.755996 | 7.112947 | 2.748656 | 0.004603 |

wel | 1.507982 | 0.713183 | 1.107201 | 0.285045 |

Understanding the links between parameters and forecast uncertainties can be usefull - in particular to gain insight into the system dynamics. But we are still missing a step to understand what *observation* data affects the forecast. It is often more straightforward to quantify how observation information imapcts forecast uncertianty so that we can explore the worth of observation data directly.

# Data worth analysis

Note: We willnotdemonstrate data worth analysis here. See the respective notebook in Part2 of these tutorials.

The worth of data is measured by their ability to reduce the uncertainties of model predictions that we care about. Linear analysis is particularly useful for exploring data worth. This is because the equations that it uses to calculate predictive uncertainty do not include terms that represent the actual values of observations or of parameters; only sensitivities of model outputs to parameters are required. Therefore, linear analysis can be used to assess the ability (or otherwise) of yet-ungathered data to reduce the uncertainties of decision-critical predictions.

### This is __Huge__. Let me say it again.

This means that potential field measurements that correspond to one or many outputs of a model can be assessed for their worth. For example, it is possible to assess the worth of observations of head in every single model cell at every time step of a model run with a relatively small computational burden. This makes linear analysis a useful tool for designing and comparing strategies for data-collection, when data acquisition seeks to reduce the uncertainties of one or a number of decision-critical predictions.

There are two main applications for data worth analysis:

- ranking of the relative worth of existing observations by calculating predictive uncertainty with selected individual or combined observations removed from a calibration dataset.
- ranking of the relative worth of
**potential**new observations by calculating predictive uncertainty with selected individual or combined observations added to an existing calibration dataset.