Goodness of fit
Outline
Topics
- General notion of goodness-of-fit checks.
- Specific example for Bayesian models: posterior predictive checks.
- Limitations.
Rationale
Is your statistical model missing some critical aspect of the data? We have approached this question in a qualitative way earlier in the course. Today, we provide a more quantitative approach. In practice, both qualitative and quantitative model criticism are essential ingredients of an effective Bayesian data analysis.
What is goodness-of-fit?
- Goodness-of-fit: a procedure to assess if a model is good (approximately well-specified) or bad (grossly mis-specified).
- Applies to both Bayesian and non-Bayesian models, obviously we focus on the former today.
Review: calibration
To understand today’s material we need to review the notion of calibration of credible intervals.
Question: for well-specified models, credible intervals are…
- calibrated for small data, calibrated for large data
- not calibrated for small data, calibrated for large data
- only approximately calibrated for both small and large data
- none of the above
Recall: in a Bayesian well-specified context, calibration holds for all dataset sizes!
This suggests calibration could be useful to detect mis-specification…
From calibration to goodness-of-fit
Question: Can we do goodness-of-fit check calibration on latent variables \(X\)?
- Yes
- No
No: in a real data analysis scenario, \(x\) is unknown, so we cannot check if \(x\) is contained in a corresponding credible interval.
Suggestions?
Review: prediction
- Recall that Bayesian models can be used to predict the next observation, \(y_{n+1}\).
- We did this…
- mathematically,
- in simPPLe,
- in the first quiz,
- and in this week’s exercise you will do it in Stan using generated quantities.
Question: Can we do goodness-of-fit check calibration on a prediction?
- Yes
- No
Yes: using a leave-one-out technique:
- instead of giving all \(n\) data points to Stan, give only the first \(n-1\) data points,
- leave data point \(n\) out (hence the name).
- Compute, say a \(99\%\) credible interval \(C\) predicting the \(n\)-th observation based on data \(y_1, \dots, y_{n-1}\).
- If \(y_n \notin C\): output a warning.
Posterior predictive check
- Let \(C(y)\) denote a 99% credible interval computed from data \(y\).
- Let \(y_{\backslash n}\) denote the data excluding point \(n\).
- Output a warning if \(y_n \notin C(y_{\backslash n})\).
Proposition: if the model is well-specified, \[\mathbb{P}(Y_n \in C(Y_{\backslash n})) = 99\%.\]
Proof: special case of our generic result on calibration of credible intervals..
Question: what are potential cause(s) of a posterior predictive “warning”, (i.e., \(y_n \notin C(y_{\backslash n})\)):
- Model mis-specification.
- Posterior is not approximately normal.
- MCMC too slow and/or not enough samples.
- Bad luck.
- Software defect.
- a, b
- a, b, c
- a, c, e
- a, c, d, e
- None of the above
- Model mis-specification.
- Yes, as the name of this page suggests!
- Posterior is not approximately normal.
- No, that’s irrelevant to the present situation.
- MCMC too slow and/or not enough samples.
- Yes, and we will talk more about it this week.
- Bad luck.
- Yes, even in the well-specified case, there is a (100-99)% chance that the warning is issued (related to so-called “type I error” in frequentist statistics).
- Software defect.
- Yes, and we will talk more about it this week.
- So the correct answer is
a, c, d, e
. - As you can see, there are many other choices (
c, d, e
) on top of the one we are interested in (a
)- …which complicates the interpretation of posterior predictive checks.
- We will see next some strategies to address
c
ande
.