HMC as involution

Outline

Topics

  • Inspiration from continuous time dynamics
  • Leap-frog integrator
  • “Flip trick”

Rationale

We can complete our goal of writing formally HMC, as a special case of the MH with deterministic proposal we just saw.

Plan

Inspiration from physics

  • Physics gives us precise expression for quantifying the notion of “rolling a ball”.
  • As often, it is easier to describe how things change rather than how thing are,
    • i.e. to provide an ordinary differential equation (ODE).
  • ODE for the continuous time evolution of frictionless ball’s position and momentum \((x^{(t)}, p^{(t)})\): \[\begin{align*} x' = \frac{\mathrm{d}x^{(t)}}{\mathrm{d}t} &= p \;\;\text{(in a small time interval the change in position is equal to the momentum)} \\ p' = \frac{\mathrm{d}p^{(t)}}{\mathrm{d}t} &= - \nabla U(x) \;\;\text{(the velocity decreases proportionally to the steepness of the landscape)}. \end{align*}\]

Recall: the gradient \(\nabla U(x) = (\partial U/\partial x_1, \dots, \partial U/\partial x_d)\) is a vector pointing towards the direction of steepest ascent.

Flip trick

  • At the very end of the trajectory, in the analysis we flip the momentum: \[T_\text{flip}(x, p) = (x, -p).\]

  • This will allow us to prove the involution property:
    • After the momentum sign flip,
    • the ball will retrace its step!
  • Since we alternate between an HMC kernel and sampling a new momentum
    • …this flipped momentum will be overwritten right away,
    • i.e. this is just a device for theoretical analysis in the vanilla HMC algorithm.

Notation

  • let \(\epsilon\) denote a small time interval (discretization),
  • let \(L\) denote the number of small steps we will use,
    • hence, \(t = \epsilon L\).

Leap frog integrator

Key idea: update only one of \(\{x, p\}\) at the time to get triangular Jacobian matrices (easier to analyze).

Drift operator

  • Let the current momentum move the current position a bit.
  • Discretization of \(x' = p\): \[T_\text{drift}(x, p) = (x + \epsilon p, p).\]

Question: compute the Jacobian determinant of \(T_\text{drift}\).

  1. \(x\)
  2. \(p\)
  3. \(1\)
  4. \(xp\)
  5. None of the above.

Correct answer is 1.

First, we compute the Jacobian matrix:

\[\begin{align*} \nabla T_\text{drift}(x, p) &= \begin{bmatrix} \dfrac{\partial T_1}{\partial x} & \dfrac{\partial T_1}{\partial p}\\ \dfrac{\partial T_2}{\partial x} & \dfrac{\partial T_2}{\partial p} \end{bmatrix} \\ &= \begin{bmatrix} 1 & \epsilon \\ 0 & 1 \end{bmatrix}. \end{align*}\]

Hence, the Jacobian determinant is 1.

Kick operator

  • Let the steepness of the landscape at current position update the momentum a bit.
  • Discretization of \(p' = - \nabla U\): \[T_\text{kick}(x, p) = \left(x, p + \frac{\epsilon}{2} \nabla \log \pi(x)\right).\]
# example: a normal
# This function return gradient_x of log pi(x)
gradient = function(x) {
  -2*x # = - 0.5 x^2 / sigma^2, i.e. a normal with variance sigma^2 = 0.5
}

epsilon = 0.1

kick = function(s) {
  x = s[[1]]
  p = s[[2]]
  c(x, p + epsilon * gradient(x) / 2)
}

flip = function(s) {
  x = s[[1]]
  p = s[[2]]
  c(x, -p)
}

initial = c(1.1, 2.3) 

# flip o kick o flip o kick
#   where 'o' denotes 'function composition',
#   which recall are read right to left
newpoint = kick(initial) 
flipped = flip(newpoint)
traceback = kick(flipped)
fipped_again = flip(traceback)

initial
[1] 1.1 2.3
newpoint
[1] 1.10 2.19
flipped
[1]  1.10 -2.19
traceback
[1]  1.1 -2.3
fipped_again
[1] 1.1 2.3

Ah! Ah! we come back where we started!

I.e., the code above illustrates that \(T_\text{flip}\circ T_\text{kick}\circ T_\text{flip}\circ T_\text{kick}= I\), which is easy to show and a building block for showing that the full proposal is an involution. (And a great way to check correctness of involution-based MH algorithms!)

We have the same property for drifts:

drift = function(s) {
  x = s[[1]]
  p = s[[2]]
  c(x + epsilon * p, p)
}


initial = c(1.1, 2.3) 

# flip o kick o flip o kick
newpoint = drift(initial) 
flipped = flip(newpoint)
traceback = drift(flipped)
fipped_again = flip(traceback)

initial
[1] 1.1 2.3
newpoint
[1] 1.33 2.30
flipped
[1]  1.33 -2.30
traceback
[1]  1.1 -2.3
fipped_again
[1] 1.1 2.3

Leap frog:

  • Alternate between kick and drift.
  • Do so in a palindromic fashion to obtain involution after flip. \[T_\text{leap-frog}= T_\text{kick}\circ T_\text{drift}\circ T_\text{kick}.\]
L = 5

hmc_proposal = function(s) {
  for (i in 1:L) {
    s = kick(s)
    s = drift(s)
    s = kick(s)
  }
  flip(s)
}


newpoint = hmc_proposal(initial) 
apply_twice = hmc_proposal(newpoint)

initial 
[1] 1.1 2.3
newpoint
[1]  1.8957642 -0.7389151
apply_twice
[1] 1.1 2.3

Proposition: \(T = T_\text{flip}\circ T_\text{leap-frog}^L\) is an involution, where \(T^n = T^{n-1} \circ T\).

Proof idea: use the property \(T_\text{flip}\circ T_\text{kick}\circ T_\text{flip}\circ T_\text{kick}= I\) we encountered (and similar one for the drift) and the fact that \(T_\text{leap-frog}^l\) is a palindrome (can be read both ways, like “madam” or “nurses run”).

Proposition: \(T = T_\text{flip}\circ T_\text{leap-frog}^L\) has Jacobian one.

Proof idea: show that each constituent has Jacobian determinant one, so the product of these is also one.

Putting it all together

Recall the high-level organization of HMC is the deterministic alternation of:

  • a complicated kernel \(K_\text{hmc}\): “rolling the ball”,
  • a simple kernel \(K_\text{gibbs}\): sampling the momentum.

We can now define \(K_\text{hmc}\): it it the deterministic proposal MH with proposal \(T = T_\text{flip}\circ T_\text{leap-frog}^L\).

Reference

See Neal, 2012 for an in-depth tutorial on HMC, in particular how to tune it.