Kernel alternations

Outline

Topics

Alternations of MCMC kernels
Code implementation
Alternations preserve invariance

Rationale

Alternation of kernels is another method to combine kernels. It is mathematically slightly more complicated than mixtures, but can yield faster mixing (in some situations, much faster).

Here we just give an introduction to kernel alternation. We will talk about the reason why they can yield much faster performance when we talk about non-reversibility.

Alternation of MCMC kernels

Example

We use the same example as the one we used for MCMC mixtures.

Code

# prior: Beta(alpha, beta)
alpha = 1
beta = 2 

# observations: binomial draws
n_successes = 3 
n_trials = 3

gamma_beta_binomial = function(p) {
  if (p < 0 || p > 1) return(0.0)
  dbeta(p, alpha, beta) * dbinom(x = n_successes, size = n_trials, prob = p)
}

Intuition

Setup: Denote the 2 kernels we want to use by $K_1$ and $K_2$.

Idea: at each MCMC iteration, let $x$ denote the current state,

Sample from $K_1$ from the current state.
Sample from $K_2$ from the state output by $K_1$.

Coding up kernel alternation

Here is an example how to implement the alternation. It is based on the same normal proposal MH kernels, $K_1$ and $K_2$, with standard deviations 1 and 2 respectively.

Code

kernel = function(gamma, current_point, proposal_sd) {
  dim = length(current_point)
  proposal = rnorm(dim, mean = current_point, sd = proposal_sd) 
  ratio = gamma(proposal) / gamma(current_point) 
  if (runif(1) < ratio) {
    return(proposal)
  } else {
    return(current_point)
  }
}

# simple Metropolis-Hastings algorithm (normal proposal)
mcmc_alternation = function(gamma, initial_point, n_iters) {
  samples = numeric(n_iters) 
  dim = length(initial_point)
  current_point = initial_point
  for (i in 1:n_iters) {

1    intermediate_point = kernel(gamma, current_point, 1)
2    current_point = kernel(gamma, intermediate_point, 2)
    
    samples[i] = current_point
  }
  return(samples)
}

samples = mcmc_alternation(gamma_beta_binomial, 0.5,  1000)

1: Sample from $K_1$.
2: Sample from $K_2$ from the state output by $K_1$.

Mathematical notation for kernel mixtures

Question: write a mathematical expression for the alternation $K_\text{alt}$ in terms of $K_1$ and $K_2$.

Click for choices

$K_\text{alt}(x' | x) = \sum_{\check x} K_1(\check x | x) K_2(x' | \check x)$
$K_\text{alt}(x' | x) = \sum_{\check x} K_1(\check x | x') K_2(x | \check x)$
$K_\text{alt}(x' | x) = \sum_{i=1}^2 0.5 K_i(x' | x).$
$K_\text{alt}(x' | x) = \sum_{i=1}^2 0.5 K_i(x | x').$
None of the above.

Click for hint

use the following notation:

$X_1$: state before applying $K_\text{alt}$
$X_2$: intermediate state obtained from $K_1$ (called intermediate_point in the code)
$X_3$: state after applying $K_\text{alt}$

We want an expression for $K_\text{alt}(x' | x) = \mathbb{P}(X_3 = x' | X_1 = x)$. Notice that we need to “add” $X_2$ to this expression, how do we do this?

Click for answer

The first one.

We have:

\[\begin{align*} K_\text{alt}(x' | x) &= \mathbb{P}(X_3 = x' | X_1 = x) \\ &= \sum_{\check x} \mathbb{P}(X_3 = x', X_2 = \check x | X_1 = x) \;\;\text{(marginalization)} \\ &= \sum_{\check x} \mathbb{P}(X_2 = \check x | X_1 = x) \mathbb{P}(X_3 = x' | X_2 = \check x, X_1 = x) \;\;\text{(chain rule)} \\ &= \sum_{\check x} \mathbb{P}(X_2 = \check x | X_1 = x) \mathbb{P}(X_3 = x' | X_2 = \check x) \;\;\text{(Markov assumption)} \\ &= \sum_{\check x} K_1(\check x | x) K_2(x' | \check x) \;\;\text{(by definition)}. \end{align*}\]

Invariance result

Alternation preserves invariance

Proposition: if $K_i$ are $\pi$-invariant, then their alternation is $\pi$-invariant.

Proof:

\[\begin{align*} \sum_{x} \pi(x) K_\text{alt}(x' | x) &= \sum_{x} \pi(x) \sum_{\check x} K_1(\check x | x) K_2(x' | \check x) \\ &= \sum_{\check x} K_2(x' | \check x) \sum_{x} \pi(x) K_1(\check x | x) \\ &= \sum_{\check x} K_2(x' | \check x) \pi(\check x) \;\;\text{(invariance of $K_1$)} \\ &= \pi(x') \;\;\text{(invariance of $K_2$)}. \end{align*}\]