Checking correctness

Outline

Topics

Using calibration to check correctness of model code.

Rationale

Code implementing Bayesian models can get complex in real applications. Complex code invariably means software defects (bugs) will creep in. We review a powerful method to detect bugs in Bayesian inference software.

The topic covered here is typically known as “software testing”. However we avoid the terminology “testing” here as it is already used in the statistical literature.

From goodness-of-fit to correctness check

In the last page, we developed a procedure for goodness-of-fit.
However we identified several factors that can lead to a “warning”.

Question: how can we modify last page’s check to exclude “model mis-specification” as a potential cause?

Click for choices

Use simulated data.
Use a smaller dataset.
Use a larger dataset.
Repeat the test several times.
None of the above.

Click for answer

Best answer is to use simulated data. When the code is correct, simulated data leads to the well-specified case. In that case, we know credible interval are calibrated.

If the code is incorrect, that guarantee is not present anymore.

Using a larger dataset will lead to only approximate calibration. Hence when we hit a warning we cannot be completely sure if it is because we need to use even more data or if there is an actual bug. Larger datasets also mean the test will take a long time to run, which is undesirable.

Additional references

Note: there are more sophisticated method for checking correctness of MCMC code, see for example:

The Exact Invariance Test in Bouchard-Côté, 2022, Section 10.5.
This line of work was initiated in Geweke, 2004. Reverse citation search on that article provides a comprehensive view of the literature on checking correctness of MCMC algorithms.