Consistency of MCMC

Outline

Topics

MH’s consistency guarantee.
Notion of irreducibility.

Rationale

Consistency of MH is a key property that explains MCMC’s popularity. We will prove the key steps of this theorem later on, today we only state it.

Key theoretical guarantee of MH

We have the same type of result as we encountered in Simple Monte Carlo and SNIS.
Namely: for any approximation error tolerance, we can find a number of iterations \(M\) large enough such that we will be within that error tolerance with high probability after \(M\) iterations.
Recall that the name for the above property is consistency.
However to get consistency with MH we will need one extra assumption…

Additional assumption

Compared to SNIS, we need an additional assumption to get consistency.
Informally: the proposal should permit the algorithm to explore the whole state space.
Technical name: irreducibility.

Definition: (discrete case¹) Let \(\pi\) denote the posterior PMF, and \(X^{(1)}, X^{(2)}, \dots\), the random variables produced at each iteration of MH. We say the chain \(X^{(m)}\) is irreducible if for any states \(x, x'\) with \(\pi(x) > 0\) and \(\pi(x') > 0\), we can get from \(x\) to \(x'\) with positive probability, i.e., there exists a number of steps \(m\) such that \(\mathbb{P}(X^{(m)} = x' | X^{(0)} = x) > 0\).

Consistency of MH

Proposition: if \(\mathbb{E}_\pi|g(X)| < \infty\), and the chain \(X^{(m)}\) produced by MH is irreducible, then² \[\frac{1}{M} \sum_{m=1}^M g(X^{(m)}) \to \mathbb{E}_\pi[g(X)], \tag{1}\] as the number of MCMC iterations \(M\) goes to infinity.

Note: the right hand side of Equation 1 does not involve the initial distribution, i.e., the distribution we use to initialize our MCMC algorithm is eventually “forgotten” when the chain is irreducible.

Footnotes

For the generalization to continuous state spaces, see e.g., Geyer’s notes, page 5. In summary, the difficulty in continuous state space is that the probability of reaching an individual “destination point” \(x'\) is zero. The solution is the replace the destinations \(x'\) by positive probability events.↩︎
More precisely, let \(E = \{x : (X^{(m)}) \text{ initialized at }x, \frac{1}{M} \sum_{m=1}^M g(X^{(m)}) \to \mathbb{E}_\pi[g(X)]\}\). Then \(\pi(E) = 1\). This is known as Birkhoff’s ergodic theorem (combined here with the additional structure that the sequence is Markovian and irreducible to obtain a deterministic limit). For more information and a proof, see for example Douc et al., 2018, Theorem 5.2.1.↩︎