Link Search Menu Expand Document

FOSM - a brief overview (with equations!)

Throughout the previous tutorial notebooks we have explored how parameter estimation affects posterior parameter and forecast uncertainty. This notebook goes through some of the detail of how these uncertainties are calculated by PEST++ and pyemu.

FOSM stands for “First Order, Second Moment”, which is the mathematical description of what is being described. In PEST documentation (and other GMDSI tutorials), it is sometimes referred to as “linear analysis”. See also page 460 in Anderson et al. (2015).

Anderson, M. P., Woessner, W. W., & Hunt, R. J. (2015). Applied Groundwater Modeling: Simulation of Flow and Advective Transport. In Applied Groundwater Modeling (2nd ed.). Elsevier. https://linkinghub.elsevier.com/retrieve/pii/B9780080916385000018

Pages 461-465 of Anderson et al. use the PREDUNC equation of PEST to discuss an applied view of FOSM, what goes into it, and what it means in practice. Here we will look more closely at these. The objective is to get a better feel for what is going on under the hood in linear uncertainty analyses.

Side Note: in Part2 of this series of tutorial notebooks we demonstrate a complete FOSM and Data-worth analysis workflow using pyemu and PEST++. The current notebook merely aims to provide a very high level introduction to some of the concepts.

FOSM provides approximate mathematical characterisation of prior predictive probability distributions, and of posterior parameter and predictive probability distributions. It has other uses as well. It can be used to demonstrate how the history-matching process bestows worth on data. It can also be deployed to track the flow of information from field measurements of system state to parameters, and ultimately from parameters to model predictions.

It does all of these things by implementing Bayes equation under the following assumptions:

  • The prior probability distribution of parameters is multiGaussian.
  • “Measurement noise” (including structural noise) is also characterized by a Gaussian distribution.
  • The relationships between model outputs that correspond to measurements of system state and parameters employed by a model can be approximated by the action of a matrix on a vector.
  • Model outputs that correspond to predictions of management interest can be calculated using another matrix that acts on model parameters.

Ideally linear analysis is undertaken after a model has been calibrated. However, if a model is truly linear (which it never is), the outcomes of FOSM are independent of parameter values and can therefore, in theory, be applied with the user-supplied prior mean parameter values.

If calibration has been undertaken, then minimum-error variance (i.e. calibrated) parameter values should be assigned to parameters as their initial parameters in the “parameter data” section of the PEST control file on which linear analysis is based. The Jacobian matrix should be calculated using these parameters. And, if the uncertainty of a prediction is going to be examined, then the model output that pertains to this prediction must be included as an “observation” in the PEST input dataset; sensitivities of this model output to model parameters will therefore appear in the Jacobian matrix.

FOSM tasks may include:

  • approximate parameter and predictive uncertainty quantification;
  • data worth analysis;
  • identifying parameters that are most salient for forecasts of interest,
  • identifying parameter contributions to predictive uncertainty and
  • assessing parameter identifiability.

Outcomes of these analyses can provide easily understood insights into what history-matching can and cannot achieve with the available information. These insights can be used to streamline the data assimlation process and guide further site characterisation studies. Of particular interest is data worth analysis. The worth of data is measured by their ability to reduce the uncertainties of model predictions that we care about. Because the equations on which FOSM relies do not require that an observation value already be known, data worth can be assessed on as-of-yet ungathered data.

But we are getting ahead of ourselves, let’s take this back to basics.

The famous Bayes Rule in a nutshell:

We update our knowledge by comparing what we know/believe with measured/observed data. What we know now, is a function of what we knew before, compared to what we learned from measured data.

\[\underbrace{P(\boldsymbol{\theta}|\textbf{d})}_{\substack{\text{what we} \\ \text{know now}}} \propto \underbrace{\mathcal{L}(\boldsymbol{\theta} | \textbf{d})}_{\substack{\text{what we} \\ \text{learned}}} \underbrace{P(\boldsymbol{\theta})}_{\substack{\text{what we} \\ \text{knew}}}\]

We can also think of this graphically, as taken from Anderson et al. (2015) in slightly different notation but the same equation and concept:

The problem is, for real-world problems, the likelihood function (“what we learned”) is high-dimensional and non-parameteric, requiring non-linear (typically Monte Carlo) integration for rigorous Bayes. Unfortunatley, non-linear methods are computationaly expensive and ineficient as we will see in a subsequent notebook.

But, we can make some assumptions and greatly reduce computational burden. This is why we often suggest using these linear methods first before burning the silicon on the non-linear ones like Monte Carlo.

How do we reduce the computational burden?

By assuming that:

1. There is an approximate linear relation between parameters and observations:

$\mathbf{J} \approx \text{constant}$, $\frac{\partial\text{obs}}{\partial\text{par}} \approx \text{constant}$

2. The parameter and forecast prior and posterior distributions are approximately Gaussian:

$ P(\boldsymbol{\theta}|\mathbf{d}) \approx \mathcal{N}(\overline{\boldsymbol{\mu}}_{\boldsymbol{\theta}},\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}})$

Armed with these two assumptions, from Bayes equations, one can derive the Schur complement for conditional uncertainty propogation:

$\underbrace{\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{know now}}} = \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\substack{\text{what we} \\ \text{knew}}} - \underbrace{\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T\left[\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}\bf{J}^T + \boldsymbol{\Sigma}_{\boldsymbol{\epsilon}}\right]^{-1}\bf{J}\boldsymbol{\Sigma}_{\boldsymbol{\theta}}}_{\text{what we learned}}$

Highlights:

  1. There are no parameter values or observation values in these equations!
  2. “us + data” = $\overline{\Sigma}{\theta}$; “us” = $\Sigma{\theta}$. This accounts for information from both data and expert knowledge.
  3. The ‘-‘ on the right-hand-side shows that we are (hopefully) collapsing the probability manifold in parameter space by “learning” from the data. Or put another way, we are subtracting from the uncertainty we started with (we started with the Prior uncertainty)
  4. Uncertainty in our measurements of the world is encapsulated in $\Sigma_{\epsilon}$. If the “observations” are highly uncertain, then parameter “learning” decreases because $\Sigma_{\epsilon}$ is in the denominator. Put another way, if our measured data are made (assumed) to be accurate and precise, then uncertainty associated with the parameters that are constrained by these measured data is reduced - we “learn” more.
  5. What quantities are needed? $\bf{J}$, $\boldsymbol{\Sigma}{\theta}$, and $\boldsymbol{\Sigma}{\epsilon}$
  6. The diagonal of $\Sigma_{\theta}$ and $\overline{\Sigma}_{\theta}$ are the Prior and Posterior uncertainty (variance) of each adjustable parameter

But what about forecasts?

We can use the same assumptions:

  • prior forecast uncertainty (variance): $\sigma^2{s} = \mathbf{y}^T\boldsymbol{\Sigma}{\boldsymbol{\theta}}\mathbf{y}$
  • posterior forecast uncertainty (variance): $\overline{\sigma}^2{s} = \mathbf{y}^T\overline{\boldsymbol{\Sigma}}{\boldsymbol{\theta}}\mathbf{y}$

Highlights:

  • Again, no parameter values or forecast values!
  • What’s needed? $\bf{y}$, which is the sensitivity of a given forecast to each adjustable parameter. Each forecast will have its own $\bf{y}$.
  • How do I get $\bf{y}$? the easiest way is to include your forecast(s) as an observation in the control file - then we get the $\bf{y}$’s for free during the parameter estimation process.

Mechanics of calculating FOSM parameter and forecast uncertainty estimates

in the PEST world:

In the origingal PEST (i.e., not PEST++) documentation, FOSM is referred to as linear analysis. Implementing the various linear analyses relies a suite of utility software and a series of user-input-heavy steps, as illustrated in the figure below.

in PEST++:

In the PEST++ world, life is much easier. By default, PEST++GLM implements FOSM on-the-fly (it can be deactivated if the user desires) and records parameter and forecast uncertainties throughout the parameter estimation process.

Let’s take a closer look and get a feel what is going on.

FOSM with PEST++ Demo

In the tutorial directory there is a folder containing the outcomes a PEST++GLM parameter estimation run. (These are based on the model and PEST setup constructed in the “part1_freyberg_pilotpoints” notebooks.) In the following section we will access several of these files using pyemu. It is assumed that the reader is familiar with the basics of pyemu.

Parameter estimation has already been undertaken with PEST++GLM. So we already have at our disposal a jacobian matrix, and the parameter and forecast uncertainty files written by PEST++GLM.

import sys
import os
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning) 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt;
import psutil
import shutil

sys.path.insert(0,os.path.join("..", "..", "dependencies"))
import pyemu
import flopy
assert "dependencies" in flopy.__file__
assert "dependencies" in pyemu.__file__
sys.path.insert(0,"..")
import herebedragons as hbd

plt.rcParams['font.size'] = 10
pyemu.plot_utils.font =10

The folder with model and pest files:

working_dir = "master_pp"

The PEST control file name:

pst_name = "freyberg_pp.pst"

Load the PEST control file:

pst = pyemu.Pst(os.path.join(working_dir, pst_name))

Let’s look at the parameter uncertainty summary written by pestpp:

PEST++GLM records a parameter uncertainty file named casename.par.usum.csv. It records the prior and posterior means, bounds and standard deviations.

df = pd.read_csv(os.path.join(working_dir,pst_name.replace(".pst",".par.usum.csv")),index_col=0)
df.tail()
prior_mean prior_stdev prior_lower_bound prior_upper_bound post_mean post_stdev post_lower_bound post_upper_bound
name
rch_i:17_j:2_zone:1.0 0.0 0.150515 -0.30103 0.30103 0.30103 0.149751 0.001528 0.600532
rch_i:17_j:12_zone:1.0 0.0 0.150515 -0.30103 0.30103 -0.30103 0.150025 -0.601081 -0.000979
rch_i:37_j:17_zone:1.0 0.0 0.150515 -0.30103 0.30103 -0.30103 0.150258 -0.601546 -0.000514
rch_i:27_j:12_zone:1.0 0.0 0.150515 -0.30103 0.30103 -0.30103 0.150081 -0.601192 -0.000868
rch_i:12_j:17_zone:1.0 0.0 0.150515 -0.30103 0.30103 -0.30103 0.149847 -0.600725 -0.001335

We can visualize this with probability distributions. In the plot below, prior parameter distributions are shown by the dashed grey lines. Posterior parameter distributions are the blue shaded areas. Each plot shows distributions for parameters in the same group:

par = pst.parameter_data
df_paru = pd.read_csv(os.path.join(working_dir,pst_name.replace(".pst",".par.usum.csv")),index_col=0)

fig, axes=plt.subplots(1,len(pst.adj_par_groups),figsize=(15,5))

for pargp, ax in zip(pst.adj_par_groups, axes):
    hk_pars = [p for p in pst.par_names if p.startswith("hk")]
    pars = par.loc[par.pargp==pargp].parnme.values
    df_par = df_paru.loc[pars,:]
    ax = pyemu.plot_utils.plot_summary_distributions(df_par,label_post=False, ax=ax)
    mn = np.log10(pst.parameter_data.loc[pars[0].lower(),"parlbnd"])
    mx = np.log10(pst.parameter_data.loc[pars[0].lower(),"parubnd"])
    ax.set_title(pargp)

png

There is a similar file for forecasts:

casename.pred.usum.csv

axes = pyemu.plot_utils.plot_summary_distributions(os.path.join(working_dir,pst_name.replace(".pst",".pred.usum.csv")),subplots=True)

png

Where do the prior parameter distributions come from?

Prior parameter distributions can come from one of two sources.

  1. If no other information is provided, PEST++GLM assumes that all adjustable parameters are statistically independent. In this case, by default, the prior standard deviation of each parameter is calculated as a quarter of the difference between its upper and lower bounds in the PEST control file.(This is the case here)
  2. Alternatively, the name of a prior parameter covariance matrix file can be provided to the parcov() control variable.

Where do the prior forecast distributions come from?

At the first iteration of the parameter estimation process, PEST++GLM calculates sensitivities based on initial parameter values. These are used to determine the prior parameter and forecast uncertianty.

Why are are the posterior distributions different than the priors?

Recall Bayes’ Rule? By comparing model outputs to measured data we have “learnt” information about model parameters, thus “updating our prior” and reducing parameter (and forecast) uncertainty.

FOSM with pyEMU

Now, pyemu does the same calculations, but also allows you to do other, more exciting things!

We need three ingredients for FOSM:

  • parameter covariance matrix
  • observation noise covariance matrix
  • jacobian matrix

The Schur object is one of the primary object for FOSM in pyEMU and the only one we will talk about in this tutorial.

sc = pyemu.Schur(jco=os.path.join(working_dir,pst_name.replace(".pst",".jcb")),verbose=False)

Now that seemed too easy, right? Well, underhood the Schur object found the control file (“freyberg_pp.pst”) and used it to build the prior parameter covariance matrix, from the parameter bounds and the observation noise covariance matrix from the observation weights. These are the Schur.parcov and Schur.obscov attributes.

The Schur object also found the “++forecasts()” optional pestpp argument in the control, found the associated rows in the Jacobian matrix file and extracted those rows to serve as forecast sensitivity vectors:

sc.pst.pestpp_options['forecasts']
'headwater:4383.5,tailwater:4383.5,trgw-0-9-1:4383.5,part_time'

The Jacobian Matrix and Forecast Sensitivity Vectors

Recall that a Jacobian matrix looks at the changes in observations as a parameter is changed. Therefore the Jacobian matrix has parameters in the columns and observations in the rows. The bulk of the matrix is made up of the difference in observations between a base run and a run where the parameter at the column head was perturbed (typically 1% from the base run value - controlled by the “parameter groups” info). Now we’ll plot out the Jacobian matrix as a DataFrame:

sc.jco.to_dataframe().loc[sc.pst.nnz_obs_names,:].head()
strinf wel0 wel2 wel4 wel5 wel1 wel3 hk_i:12_j:2_zone:1.0 hk_i:32_j:12_zone:1.0 hk_i:2_j:17_zone:1.0 ... rch_i:2_j:7_zone:1.0 rch_i:2_j:17_zone:1.0 rch_i:7_j:17_zone:1.0 rch_i:22_j:12_zone:1.0 rch_i:22_j:2_zone:1.0 rch_i:17_j:2_zone:1.0 rch_i:17_j:12_zone:1.0 rch_i:37_j:17_zone:1.0 rch_i:27_j:12_zone:1.0 rch_i:12_j:17_zone:1.0
gage-1:3652.5 2716.948265 -4.548055 -4.719683 -5.232870 -4.498392 -4.689892 -5.017089 3.445071 31.416424 15.240922 ... 190.031485 115.306638 118.042265 119.114189 223.313201 231.708826 138.388709 63.774836 113.687036 121.292411
gage-1:3683.5 2774.639984 -181.311316 -166.978297 -10.342791 -87.590161 -110.276389 -32.921654 4.402686 32.094217 15.121772 ... 221.004388 134.915061 136.592731 130.330209 238.872431 252.221559 153.398371 69.859064 123.139058 138.266282
gage-1:3712.5 2780.020916 -276.473740 -245.350759 -20.599377 -151.432872 -185.385447 -67.049198 4.654596 36.865047 18.845729 ... 229.133526 153.496081 156.089054 145.632341 242.263018 256.949881 169.704385 83.319417 138.394319 158.253214
gage-1:3743.5 2783.890941 -335.620872 -291.893149 -35.153415 -197.641178 -238.817336 -101.203897 4.971822 46.180032 24.684773 ... 242.111479 176.567497 180.120737 164.500045 247.881326 264.289167 190.624789 98.052362 156.877133 182.798667
gage-1:3773.5 2785.355604 -374.821849 -322.374110 -50.609492 -228.410582 -276.715310 -129.809297 5.299181 55.300079 29.438466 ... 256.583441 194.019730 198.089350 178.866029 255.292138 272.878837 207.569112 107.102969 170.503931 201.037663

5 rows × 65 columns

This reports changes in observations to a change in a parameter. We can report how forecasts of interests change as the parameter is perturbed. Note pyemu extracted the forecast rows from the Jacobian on instantiation:

sc.forecasts.to_dataframe()
headwater:4383.5 tailwater:4383.5 trgw-0-9-1:4383.5 part_time
strinf 4.525783 15.049001 0.019712 -31.396749
wel0 14.641942 3.898550 -0.100690 -0.731877
wel2 11.608075 10.579199 -0.095968 -21.285655
wel4 18.162402 33.111962 -0.175127 -72.519641
wel5 8.105663 17.741830 -0.097510 -36.384135
... ... ... ... ...
rch_i:17_j:2_zone:1.0 -101.045614 -61.271020 0.718966 3973.941965
rch_i:17_j:12_zone:1.0 -11.717836 -7.108638 0.056490 52.716389
rch_i:37_j:17_zone:1.0 -0.778950 -1.143194 0.004321 17.939060
rch_i:27_j:12_zone:1.0 -7.106006 -8.015407 0.041517 183.053088
rch_i:12_j:17_zone:1.0 -4.895285 -1.892905 0.020423 -33.987577

65 rows × 4 columns

Each of these columns in a $\bf{y}$ vector used in the FOSM calculations…that’s it!

The prior parameter covariance matrix - $\boldsymbol{\Sigma}_{\theta}$

Because we have inherent uncertainty in the parameters, the forecasts also have uncertainty. Here’s what we have defined for parameter uncertainty - the Prior. As discussed above, it was constructed on-the-fly from the parameter bounds in the control file:

sc.parcov.to_dataframe()
strinf wel0 wel2 wel4 wel5 wel1 wel3 hk_i:12_j:2_zone:1.0 hk_i:32_j:12_zone:1.0 hk_i:2_j:17_zone:1.0 ... rch_i:2_j:7_zone:1.0 rch_i:2_j:17_zone:1.0 rch_i:7_j:17_zone:1.0 rch_i:22_j:12_zone:1.0 rch_i:22_j:2_zone:1.0 rch_i:17_j:2_zone:1.0 rch_i:17_j:12_zone:1.0 rch_i:37_j:17_zone:1.0 rch_i:27_j:12_zone:1.0 rch_i:12_j:17_zone:1.0
strinf 0.25 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
wel0 0.00 0.238691 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
wel2 0.00 0.000000 0.238691 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
wel4 0.00 0.000000 0.000000 0.238691 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
wel5 0.00 0.000000 0.000000 0.000000 0.238691 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
rch_i:17_j:2_zone:1.0 0.00 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.022655 0.000000 0.000000 0.000000 0.000000
rch_i:17_j:12_zone:1.0 0.00 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.022655 0.000000 0.000000 0.000000
rch_i:37_j:17_zone:1.0 0.00 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.022655 0.000000 0.000000
rch_i:27_j:12_zone:1.0 0.00 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.022655 0.000000
rch_i:12_j:17_zone:1.0 0.00 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.022655

65 rows × 65 columns

Page 463-464 in Anderson et al. (2015) spends some time on what is shown above.

For our purposes, a diagonal Prior - numbers only along the diagonal - shows that we expect the uncertainty for each parameter to only results from itself - there is no covariance with other parameters. The numbers themselves reflect “the innate parameter variability”, and is input into the maths as a standard deviation around the parameter value. This is called the “C(p) matrix of innate parameter variability” in PEST parlance.

IMPORTANT POINT: Again, how did PEST++ and pyEMU get these standard deviations shown in the diagonal? From the parameter bounds that were specified for each parameter in the PEST control file.

The matrix of observation noise - $C{\epsilon}$

Forecast uncertainty has to take into account the noise/uncertainty in the observations. Similar to the parameter Prior - the $\Sigma_{\theta}$ matrix -, it is a covariance matrix of measurement error associated with the observations. This is the same as $\Sigma_{\epsilon}$ that we discussed above. For our Fryberg problem, sthe $C{\epsilon}$ matrix would look like:

sc.obscov.to_dataframe().loc[sc.pst.nnz_obs_names,sc.pst.nnz_obs_names].head()
gage-1:3652.5 gage-1:3683.5 gage-1:3712.5 gage-1:3743.5 gage-1:3773.5 gage-1:3804.5 gage-1:3834.5 gage-1:3865.5 gage-1:3896.5 gage-1:3926.5 ... trgw-0-3-8:3743.5 trgw-0-3-8:3773.5 trgw-0-3-8:3804.5 trgw-0-3-8:3834.5 trgw-0-3-8:3865.5 trgw-0-3-8:3896.5 trgw-0-3-8:3926.5 trgw-0-3-8:3957.5 trgw-0-3-8:3987.5 trgw-0-3-8:4018.5
gage-1:3652.5 40000.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gage-1:3683.5 0.0 40000.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gage-1:3712.5 0.0 0.0 40000.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gage-1:3743.5 0.0 0.0 0.0 40000.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
gage-1:3773.5 0.0 0.0 0.0 0.0 40000.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 37 columns

IMPORTANT POINT: How did PEST++ and pyEMU get these standard deviations shown in the diagonal? From the weights that were specified for each observation in the PEST control file.

IMPORTANT POINT: You can use FOSM in the “pre-calibration” state to design an objective function (e.g. weights) to maximize forecast uncertainty reduction.

IMPORTANT POINT: In PEST++, if a given observation has a larger-than-expected residual, the variance of said observation is reset to the variance implied by the residual. That is, the diagonal elements of $\Sigma_{\epsilon}$ are reset according to the residuals

Posterior Parameter Uncertainty - ${\overline{\boldsymbol{\Sigma}}_{\boldsymbol{\theta}}} $

Okay, enough emphasis. Here’s the point. When we apply FOSM using the matrices above, we can see how our uncertainty changes during calibration, first for parameters and then for forecasts.

Here, we are updating parameter covariance following notional calibration as represented by the Jacobian matrix and both prior parameter and observation noise covariance matrices.

In other words, given prior parameter uncertainty and the inherent noise in measurments, we calculate the expected parameter uncertainty after calibration. This assumes that calibration achieves a fit comensurate with measurement noise, parameter linearity, etc.

The posterior parameter covariance matrix is stored as a pyemu.Cov object in the sc.posterior_parameter attribute. The diagonal of this matrix contains the posterior variance for each parameter. The off-diagonals the parameter covariances.

sc.posterior_parameter.to_dataframe().head()
strinf wel0 wel2 wel4 wel5 wel1 wel3 hk_i:12_j:2_zone:1.0 hk_i:32_j:12_zone:1.0 hk_i:2_j:17_zone:1.0 ... rch_i:2_j:7_zone:1.0 rch_i:2_j:17_zone:1.0 rch_i:7_j:17_zone:1.0 rch_i:22_j:12_zone:1.0 rch_i:22_j:2_zone:1.0 rch_i:17_j:2_zone:1.0 rch_i:17_j:12_zone:1.0 rch_i:37_j:17_zone:1.0 rch_i:27_j:12_zone:1.0 rch_i:12_j:17_zone:1.0
strinf 0.006971 0.004594 0.004228 0.000588 0.002349 0.003672 0.001565 -0.001135 -0.005610 0.000060 ... -0.001500 -0.000722 -0.000726 -0.000746 -0.001539 -0.001735 -0.000883 -0.000382 -0.000699 -0.000737
wel0 0.004594 0.176378 -0.052375 -0.008618 -0.037732 -0.048709 -0.022273 0.001038 0.007925 -0.003280 ... 0.000872 0.000442 0.000427 0.000363 0.000935 0.000851 0.000491 0.000094 0.000315 0.000409
wel2 0.004228 -0.052375 0.193076 -0.010004 -0.032230 -0.039082 -0.020505 0.000689 0.005073 -0.001778 ... 0.000942 0.000499 0.000489 0.000407 0.000726 0.000772 0.000527 0.000167 0.000359 0.000477
wel4 0.000588 -0.008618 -0.010004 0.213220 -0.014696 -0.012287 -0.026425 -0.000075 -0.017551 -0.004137 ... 0.000140 -0.000374 -0.000394 -0.000244 0.000513 0.000774 -0.000210 -0.000294 -0.000281 -0.000399
wel5 0.002349 -0.037732 -0.032230 -0.014696 0.210984 -0.032978 -0.023670 0.000649 -0.000570 -0.003115 ... 0.000668 0.000037 0.000014 0.000024 0.000546 0.000647 0.000115 -0.000110 -0.000021 -0.000002

5 rows × 65 columns

But…is calibration worth pursuing or not? Let’s explore what the notional calibration is expected to do for parameter uncertainty. We accomplish this by comparing prior and posterior parameter uncertainty. Using .get_parameter_summary() makes this easy:

df = sc.get_parameter_summary()
df.head()
prior_var post_var percent_reduction
strinf 0.250000 0.006971 97.211534
wel0 0.238691 0.176378 26.106274
wel2 0.238691 0.193076 19.110462
wel4 0.238691 0.213220 10.671466
wel5 0.238691 0.210984 11.607882

We can plot that up:

df.percent_reduction.plot(kind="bar", figsize=(15,3));

png

Do these results make sense? Why are some parameters unaffected by calibration?

As the name suggests, the percent_reduction column shows the percentage decrease in uncertainty expected through calibration for each parameter.

From the plot above we can see that calibrating the model with available data definetly reduces uncertainty of some parameters. Some parameters are informed by observation data…however calibration does not affect all parameters equally. Available observation data does not contain information that affects these parameters. Calibration will not help us reduce their uncertainty.

Forecast Uncertainty

So far we have seen that some parameter uncertainty will be reduced. Uncertainty for other parameters will not. That’s great and all, but what we really care about are our forecast uncertainties. Do the parameters that are informed by calibration affect the forecast of interest? And will calibrating reduce the uncertainty of these forecast?

Let’s examine the prior and posterior variance of our forecasts. Recall that they are recorded as observations in the Pst control file and also listed in the pest++ forecast control variable:

forecasts = sc.pst.forecast_names
forecasts
['headwater:4383.5', 'tailwater:4383.5', 'trgw-0-9-1:4383.5', 'part_time']

As before, pyemu has already done much of the heavy-lifting. We can get a summary of the forecast prior and posterior variances with .get_forecast_summary():

df = sc.get_forecast_summary()
df
prior_var post_var percent_reduction
headwater:4383.5 1.840786e+04 1.043837e+04 43.293944
tailwater:4383.5 1.854221e+04 7.894541e+03 57.423952
trgw-0-9-1:4383.5 1.106435e+01 4.617277e+00 58.268859
part_time 8.931397e+07 7.014413e+07 21.463434

And we can make a cheeky little plot of that. As you can see, unsurprisingly some forecasts benefit more from calibration than others. So, depending on the foreacst of interest, calibration may or may not be worthwhile…

# get the forecast summary then make a bar chart of the percent_reduction column
fig = plt.figure()
ax = plt.subplot(111)
ax = df.percent_reduction.plot(kind='bar',ax=ax,grid=True)
ax.set_ylabel("percent uncertainy\nreduction from calibration")
ax.set_xlabel("forecast")
Text(0.5, 0, 'forecast')

png

Parameter contribution to forecast uncertainty

Information flows from observations to parameters and then out to forecasts. Information contained in observation data constrains parameter uncertainty, which in turn constrains forecast uncertainty. For a given forecast, we can evaluate which parameter contributes the most to uncertainty. This is accomplished by assuming a parameter (or group of parameters) is perfectly known and then assessing forecast uncertainty under that assumption. Comparing uncertainty obtained in this manner, to the forecast uncertainty under the base assumption (in which no parameter is perfectly known), the contribution from that parameter (or parameter group) is obtained.

Now, this is a pretty big assumption - in practice a parameter is never perfectly known. Nevertheless, this metric can provide usefull insights into the flow of information from data to forecast uncertainty, which can help guide data assimilation design as well as future data collection efforts.

In pyemu we can evaluate parameter contributions to forecast uncertainty with groups of parameters by type using .get_par_group_contribution():

par_contrib = sc.get_par_group_contribution()
par_contrib.head()
headwater:4383.5 tailwater:4383.5 trgw-0-9-1:4383.5 part_time
base 10438.368644 7894.540701 4.617277 7.014413e+07
hk1 544.264823 191.279078 0.022420 1.243777e+06
rchpp 8897.659624 7731.591543 4.578167 6.915516e+07
strinf 10046.303913 7333.006237 4.490364 7.014090e+07
wel 10280.959926 7838.238188 4.566155 6.994418e+07

We can see the relatve contribution by normalizing to the base case (e.g. in which no parameters/groups are perfectly known):

base = par_contrib.loc["base",:]
par_contrib = 100.0 * (base - par_contrib) / base
par_contrib.sort_index().head()
headwater:4383.5 tailwater:4383.5 trgw-0-9-1:4383.5 part_time
base 0.000000 0.000000 0.000000 0.000000
hk1 94.785921 97.577071 99.514439 98.226827
rchpp 14.760056 2.064074 0.847050 1.409910
strinf 3.755996 7.112947 2.748656 0.004603
wel 1.507982 0.713183 1.107201 0.285045

Understanding the links between parameters and forecast uncertainties can be usefull - in particular to gain insight into the system dynamics. But we are still missing a step to understand what observation data affects the forecast. It is often more straightforward to quantify how observation information imapcts forecast uncertianty so that we can explore the worth of observation data directly.

Data worth analysis

Note: We will not demonstrate data worth analysis here. See the respective notebook in Part2 of these tutorials.

The worth of data is measured by their ability to reduce the uncertainties of model predictions that we care about. Linear analysis is particularly useful for exploring data worth. This is because the equations that it uses to calculate predictive uncertainty do not include terms that represent the actual values of observations or of parameters; only sensitivities of model outputs to parameters are required. Therefore, linear analysis can be used to assess the ability (or otherwise) of yet-ungathered data to reduce the uncertainties of decision-critical predictions.

This is __Huge__. Let me say it again.

We can assess the relative worth of an observation without knowing the value of the observation.

This means that potential field measurements that correspond to one or many outputs of a model can be assessed for their worth. For example, it is possible to assess the worth of observations of head in every single model cell at every time step of a model run with a relatively small computational burden. This makes linear analysis a useful tool for designing and comparing strategies for data-collection, when data acquisition seeks to reduce the uncertainties of one or a number of decision-critical predictions.

There are two main applications for data worth analysis:

  1. ranking of the relative worth of existing observations by calculating predictive uncertainty with selected individual or combined observations removed from a calibration dataset.
  2. ranking of the relative worth of potential new observations by calculating predictive uncertainty with selected individual or combined observations added to an existing calibration dataset.