Kernel alternations

Outline

Topics

  • Alternations of MCMC kernels
  • Code implementation
  • Alternations preserve invariance

Rationale

Alternation of kernels is another method to combine kernels. It is mathematically slightly more complicated than mixtures, but can yield faster mixing (in some situations, much faster).

Here we just give an introduction to kernel alternation. We will talk about the reason why they can yield much faster performance when we talk about non-reversibility.

Alternation of MCMC kernels

Example

We use the same example as the one we used for MCMC mixtures.

Code
# prior: Beta(alpha, beta)
alpha = 1
beta = 2 

# observations: binomial draws
n_successes = 3 
n_trials = 3

gamma_beta_binomial = function(p) {
  if (p < 0 || p > 1) return(0.0)
  dbeta(p, alpha, beta) * dbinom(x = n_successes, size = n_trials, prob = p)
}

Intuition

Setup: Denote the 2 kernels we want to use by \(K_1\) and \(K_2\).

Idea: at each MCMC iteration, let \(x\) denote the current state,

  1. Sample from \(K_1\) from the current state.
  2. Sample from \(K_2\) from the state output by \(K_1\).

Coding up kernel alternation

Here is an example how to implement the alternation. It is based on the same normal proposal MH kernels, \(K_1\) and \(K_2\), with standard deviations 1 and 2 respectively.

Code
kernel = function(gamma, current_point, proposal_sd) {
  dim = length(current_point)
  proposal = rnorm(dim, mean = current_point, sd = proposal_sd) 
  ratio = gamma(proposal) / gamma(current_point) 
  if (runif(1) < ratio) {
    return(proposal)
  } else {
    return(current_point)
  }
}
# simple Metropolis-Hastings algorithm (normal proposal)
mcmc_mixture = function(gamma, initial_point, n_iters) {
  samples = numeric(n_iters) 
  dim = length(initial_point)
  current_point = initial_point
  for (i in 1:n_iters) {

1    intermediate_point = kernel(gamma, current_point, 1)
2    current_point = kernel(gamma, intermediate_point, 2)
    
    samples[i] = current_point
  }
  return(samples)
}

samples = mcmc_mixture(gamma_beta_binomial, 0.5,  1000)
1
Sample from \(K_1\).
2
Sample from \(K_2\) from the state output by \(K_1\).

Mathematical notation for kernel mixtures

Question: write a mathematical expression for the alternation \(K_\text{alt}\) in terms of \(K_1\) and \(K_2\).

  1. \(K_\text{alt}(x' | x) = \sum_{\check x} K_1(\check x | x) K_2(x' | \check x)\)
  2. \(K_\text{alt}(x' | x) = \sum_{\check x} K_1(\check x | x') K_2(x | \check x)\)
  3. \(K_\text{alt}(x' | x) = \sum_{i=1}^2 0.5 K_i(x' | x).\)
  4. \(K_\text{alt}(x' | x) = \sum_{i=1}^2 0.5 K_i(x | x').\)
  5. None of the above.

The first one.

We use the following notation:

  • \(X_1\): state before applying \(K_\text{alt}\)
  • \(X_2\): intermediate state obtained from \(K_1\) (called intermediate_point in the code)
  • \(X_3\): state after applying \(K_\text{alt}\)

We have:

\[\begin{align*} K_\text{alt}(x' | x) &= \mathbb{P}(X_3 = x' | X_1 = x) \\ &= \sum_{\check x} \mathbb{P}(X_3 = x', X_2 = \check x | X_1 = x) \;\;\text{(marginalization)} \\ &= \sum_{\check x} \mathbb{P}(X_2 = \check x | X_1 = x) \mathbb{P}(X_3 = x' | X_2 = \check x, X_1 = x) \;\;\text{(chain rule)} \\ &= \sum_{\check x} \mathbb{P}(X_2 = \check x | X_1 = x) \mathbb{P}(X_3 = x' | X_2 = \check x) \;\;\text{(Markov assumption)} \\ &= \sum_{\check x} K_1(\check x | x) K_2(x' | \check x) \;\;\text{(by definition)}. \end{align*}\]

Invariance result

Alternation preserves invariance

Proposition: if \(K_i\) are \(\pi\)-invariant, then their alternation is \(\pi\)-invariant.

Proof:

\[\begin{align*} \sum_{x} \pi(x) K_\text{alt}(x' | x) &= \sum_{x} \pi(x) \sum_{\check x} K_1(\check x | x) K_2(x' | \check x) \\ &= \sum_{\check x} K_2(x' | \check x) \sum_{x} \pi(x) K_1(\check x | x) \\ &= \sum_{\check x} K_2(x' | \check x) \pi(\check x) \;\;\text{(invariance of $K_1$)} \\ &= \pi(x') \;\;\text{(invariance of $K_2$)}. \end{align*}\]