MH as a Markov chain

Outline

Topics

  • Homogeneous Markov chains
  • Transition kernel
  • Initial distribution
  • MH as a Markov chain

Rationale

Why is the MH ratio defined as \(\gamma(x')/\gamma(x)\)?

If we modified the MH ratio, would MCMC still “work”?

Recall: by “MCMC still work” we mean “MCMC still be consistent” i.e. enjoy a law of large numbers.

Markov chain definitions

To answer the question above:

  • we need to study in more details law of large numbers for Markov chains.
  • We will need a some concepts related to Markov chains

Recall: the random variables \(X^{(1)}, X^{(2)}, \dots\) are called a Markov chain if they admit the following “chain” graphical model.

Note: today we will assume the state space is discrete, we’ll relax this soon.

Definition: let \(K_m(x' | x) = \mathbb{P}(X^{(m+1)} = x' | X^{(m)}= x)\). We call \(K_m\) the transition kernel (for the \(m\)-th step).

Definition: if all the transition kernels are equal, \(K_m = K\) for some \(K\), we say the Markov chain is homogeneous.

Definition: the marginal distribution of the first state, \(X^{(1)}\) is called the initial distribution.

The MH algorithm as a Markov chain

Recall MH involves:

  • a proposal \(q(x'|x)\)
  • a MH ratio \(r(x, x') = \gamma(x') / \gamma(x)\), and
  • an acceptance probability \(\alpha(x, x') = \min(1, r(x, x'))\).

Question: write the transition kernel \(K(x' | x)\), assuming \(x \neq x'\).

  1. \(q(x' | x)\)
  2. \(q(x' | x) \alpha(x, x')\)
  3. \(\alpha(x, x')\)
  4. \(\sum_{x'} q(x' | x) (1 - \alpha(x, x'))\)
  5. None of the above.

The probability to move to \(x' \neq x\) requires both:

  1. proposing to \(x'\), with probability \(q(x'|x)\), and,
  2. accepting the proposal, with probability \(\alpha(x, x')\).

The correct answer is \(q(x' | x) \alpha(x, x')\).

Intuition: the “and” in the hint suggests chain rule hence multiplication.

Mathematical derivation: first denote

  • the current state by \(X\)
  • the proposed state by \(X^* \sim q(\cdot|X)\)
  • the next state by \(X'\) (equal to either \(X^*\) if the proposal is accepted, \(X\) otherwise).

\[\begin{align*} K(x'|x) &= \mathbb{P}(X' = x' | X = x) \\ &= \sum_{x^*} \mathbb{P}(X' = x', X^* = x^* | X = x) \;\;\text{(marginalization)} \\ &= \sum_{x^*} \mathbb{P}(X^* = x^* | X = x) \mathbb{P}(X' = x' | X^* = x^*, X = x) \;\;\text{(chain rule)}\\ &= \sum_{x^*} q(x^*|x) \mathbb{P}(X' = x' | X^* = x^*, X = x). \;\;\text{(definition of proposal)} \end{align*}\]

  • Since \(x \neq x'\), we can only get to \(x'\) by accepting a proposal…
  • …, i.e. when \(x' = x^*\).
  • So there is only one term in the sum where \(\mathbb{P}(X' = x' | X^* = x^*, X = x) > 0\)
  • …, namely when \(x' = x^*\).

\[\begin{align*} K(x'|x) &= \sum_{x^*} q(x^*|x) \mathbb{P}(X' = x' | X^* = x^*, X = x) \\ &= q(x'|x) \mathbb{P}(X' = x' | X^* = x', X = x) \;\;\text{(see expl. above)} \\ &= q(x'|x) \alpha(x, x'). \;\;\text{(definition of accept probability)} \\ \end{align*}\]

Note: the case \(x = x'\) is a bit more complicated, because there are two situations where MH stays at the same state, namely,

  1. the proposal tells us to stay at the current state (possible with discrete proposals only) and we accept that proposal, or,
  2. the proposal tries to go somewhere (including \(x\)) but the proposal is rejected.

Notice that the two situations are disjoint, so using the additivity axiom:

\[K(x | x) = q(x|x) \alpha(x, x) + \sum_{x'} q(x'|x) ( 1- \alpha(x, x')).\]

The above equation might be impossible to numerically compute in certain situations. Fortunately, we do not need to compute it to run MCMC, we only need to simulate from \(K(\cdot | x)\). Why do we spend time writing expressions for \(K(x'|x)\) then? In order to do mathematical analysis, which gives us useful guarantees, such as law of large numbers and central limit theorems.