Goodness of fit

Outline

Topics

  • General notion of goodness-of-fit checks.
  • Specific example for Bayesian models: posterior predictive checks.
  • Limitations.

Rationale

Is your statistical model missing some critical aspect of the data? We have approached this question in a qualitative way earlier in the course. Today, we provide a more quantitative approach. In practice, both qualitative and quantitative model criticism are essential ingredients of an effective Bayesian data analysis.

What is goodness-of-fit?

  • Goodness-of-fit: a procedure to assess if a model is good (approximately well-specified) or bad (grossly mis-specified).
  • Applies to both Bayesian and non-Bayesian models, obviously we focus on the former today.

Review: calibration

To understand today’s material we need to review the notion of calibration of credible intervals.

Question: for well-specified models, credible intervals are…

  1. calibrated for small data, calibrated for large data
  2. not calibrated for small data, calibrated for large data
  3. only approximately calibrated for both small and large data
  4. none of the above

Recall: in a Bayesian well-specified context, calibration holds for all dataset sizes!

This suggests calibration could be useful to detect mis-specification…

From calibration to goodness-of-fit

Question: Can we do goodness-of-fit check calibration on latent variables \(X\)?

  1. Yes
  2. No

No: in a real data analysis scenario, \(x\) is unknown, so we cannot check if \(x\) is contained in a corresponding credible interval.

Suggestions?

Review: prediction

Question: Can we do goodness-of-fit check calibration on a prediction?

  1. Yes
  2. No

Yes: using a leave-one-out technique:

  • instead of giving all \(n\) data points to Stan, give only the first \(n-1\) data points,
  • leave data point \(n\) out (hence the name).
  • Compute, say a \(99\%\) credible interval \(C\) predicting the \(n\)-th observation based on data \(y_1, \dots, y_{n-1}\).
  • If \(y_n \notin C\): output a warning.

Posterior predictive check

  • Let \(C(y)\) denote a 99% credible interval computed from data \(y\).
  • Let \(y_{\backslash n}\) denote the data excluding point \(n\).
  • Output a warning if \(y_n \notin C(y_{\backslash n})\).

Proposition: if the model is well-specified, \[\mathbb{P}(Y_n \in C(Y_{\backslash n})) = 99\%.\]

Proof: special case of our generic result on calibration of credible intervals..

Question: what are potential cause(s) of a posterior predictive “warning”, (i.e., \(y_n \notin C(y_{\backslash n})\)):

  1. Model mis-specification.
  2. Posterior is not approximately normal.
  3. MCMC too slow and/or not enough samples.
  4. Bad luck.
  5. Software defect.
  1. a, b
  2. a, b, c
  3. a, c, e
  4. a, c, d, e
  5. None of the above
  1. Model mis-specification.
    • Yes, as the name of this page suggests!
  2. Posterior is not approximately normal.
    • No, that’s irrelevant to the present situation.
  3. MCMC too slow and/or not enough samples.
    • Yes, and we will talk more about it this week.
  4. Bad luck.
    • Yes, even in the well-specified case, there is a (100-99)% chance that the warning is issued (related to so-called “type I error” in frequentist statistics).
  5. Software defect.
    • Yes, and we will talk more about it this week.
  • So the correct answer is a, c, d, e.
  • As you can see, there are many other choices (c, d, e) on top of the one we are interested in (a)
    • …which complicates the interpretation of posterior predictive checks.
    • We will see next some strategies to address c and e.