MH as a Markov chain
Outline
Topics
- Homogeneous Markov chains
- Transition kernel
- Initial distribution
- MH as a Markov chain
Rationale
Why is the MH ratio defined as \(\gamma(x')/\gamma(x)\)?
If we modified the MH ratio, would MCMC still “work”?
Recall: by “MCMC still work” we mean “MCMC still be consistent” i.e. enjoy a law of large numbers.
Markov chain definitions
To answer the question above:
- we need to study in more details law of large numbers for Markov chains.
- We will need a some concepts related to Markov chains
Recall: the random variables \(X^{(1)}, X^{(2)}, \dots\) are called a Markov chain if they admit the following “chain” graphical model.
Note: today we will assume the state space is discrete, we’ll relax this soon.
Definition: let \(K_m(x' | x) = \mathbb{P}(X^{(m+1)} = x' | X^{(m)}= x)\). We call \(K_m\) the transition kernel (for the \(m\)-th step).
Definition: if all the transition kernels are equal, \(K_m = K\) for some \(K\), we say the Markov chain is homogeneous.
Definition: the marginal distribution of the first state, \(X^{(1)}\) is called the initial distribution.
The MH algorithm as a Markov chain
Recall MH involves:
- a proposal \(q(x'|x)\)
- a MH ratio \(r(x, x') = \gamma(x') / \gamma(x)\), and
- an acceptance probability \(\alpha(x, x') = \min(1, r(x, x'))\).
Question: write the transition kernel \(K(x' | x)\), assuming \(x \neq x'\).
- \(q(x' | x)\)
- \(q(x' | x) \alpha(x, x')\)
- \(\alpha(x, x')\)
- \(\sum_{x'} q(x' | x) (1 - \alpha(x, x'))\)
- None of the above.
The probability to move to \(x' \neq x\) requires both:
- proposing to \(x'\), with probability \(q(x'|x)\), and,
- accepting the proposal, with probability \(\alpha(x, x')\).
The correct answer is \(q(x' | x) \alpha(x, x')\).
Intuition: the “and” in the hint suggests chain rule hence multiplication.
Mathematical derivation: first denote
- the current state by \(X\)
- the proposed state by \(X^* \sim q(\cdot|X)\)
- the next state by \(X'\) (equal to either \(X^*\) if the proposal is accepted, \(X\) otherwise).
\[\begin{align*} K(x'|x) &= \mathbb{P}(X' = x' | X = x) \\ &= \sum_{x^*} \mathbb{P}(X' = x', X^* = x^* | X = x) \;\;\text{(marginalization)} \\ &= \sum_{x^*} \mathbb{P}(X^* = x^* | X = x) \mathbb{P}(X' = x' | X^* = x^*, X = x) \;\;\text{(chain rule)}\\ &= \sum_{x^*} q(x^*|x) \mathbb{P}(X' = x' | X^* = x^*, X = x). \;\;\text{(definition of proposal)} \end{align*}\]
- Since \(x \neq x'\), we can only get to \(x'\) by accepting a proposal…
- …, i.e. when \(x' = x^*\).
- So there is only one term in the sum where \(\mathbb{P}(X' = x' | X^* = x^*, X = x) > 0\)…
- …, namely when \(x' = x^*\).
\[\begin{align*} K(x'|x) &= \sum_{x^*} q(x^*|x) \mathbb{P}(X' = x' | X^* = x^*, X = x) \\ &= q(x'|x) \mathbb{P}(X' = x' | X^* = x', X = x) \;\;\text{(see expl. above)} \\ &= q(x'|x) \alpha(x, x'). \;\;\text{(definition of accept probability)} \\ \end{align*}\]
Note: the case \(x = x'\) is a bit more complicated, because there are two situations where MH stays at the same state, namely,
- the proposal tells us to stay at the current state (possible with discrete proposals only) and we accept that proposal, or,
- the proposal tries to go somewhere (including \(x\)) but the proposal is rejected.
Notice that the two situations are disjoint, so using the additivity axiom:
\[K(x | x) = q(x|x) \alpha(x, x) + \sum_{x'} q(x'|x) ( 1- \alpha(x, x')).\]
The above equation might be impossible to numerically compute in certain situations. Fortunately, we do not need to compute it to run MCMC, we only need to simulate from \(K(\cdot | x)\). Why do we spend time writing expressions for \(K(x'|x)\) then? In order to do mathematical analysis, which gives us useful guarantees, such as law of large numbers and central limit theorems.