HMC as involution

Outline

Topics

Inspiration from continuous time dynamics
Leap-frog integrator
“Flip trick”

Rationale

We can complete our goal of writing formally HMC, as a special case of the MH with deterministic proposal we just saw.

Plan

How to build and involution for “rolling a ball for time \(t\) starting at current location and momentum”?
Start with an exact description of the ball’s motion (in continuous time).
Discretize time carefully to be able to compute trajectory while maintaining the involution property.

Inspiration from physics

Physics gives us precise expression for quantifying the notion of “rolling a ball”.
As often, it is easier to describe how things change rather than how thing are,
- i.e. to provide an ordinary differential equation (ODE).
ODE for the continuous time evolution of frictionless ball’s position and momentum \((x^{(t)}, p^{(t)})\): \[\begin{align*} x' = \frac{\mathrm{d}x^{(t)}}{\mathrm{d}t} &= p \;\;\text{(in a small time interval the change in position is equal to the momentum)} \\ p' = \frac{\mathrm{d}p^{(t)}}{\mathrm{d}t} &= - \nabla U(x) \;\;\text{(the velocity decreases proportionally to the steepness of the landscape)}. \end{align*}\]

Recall: the gradient \(\nabla U(x) = (\partial U/\partial x_1, \dots, \partial U/\partial x_d)\) is a vector pointing towards the direction of steepest ascent.

Flip trick

At the very end of the trajectory, in the analysis we flip the momentum: \[T_\text{flip}(x, p) = (x, -p).\]

This will allow us to prove the involution property:
- After the momentum sign flip,
- the ball will retrace its step!
Since we alternate between an HMC kernel and sampling a new momentum…
- …this flipped momentum will be overwritten right away,
- i.e. this is just a device for theoretical analysis in the vanilla HMC algorithm.

Notation

let \(\epsilon\) denote a small time interval (discretization),
let \(L\) denote the number of small steps we will use,
- hence, \(t = \epsilon L\).

Leap frog integrator

Key idea: update only one of \(\{x, p\}\) at the time to get triangular Jacobian matrices (easier to analyze).

Drift operator

Let the current momentum move the current position a bit.
Discretization of \(x' = p\): \[T_\text{drift}(x, p) = (x + \epsilon p, p).\]

Question: compute the Jacobian determinant of \(T_\text{drift}\).

Click for choices

\(x\)
\(p\)
\(1\)
\(xp\)
None of the above.

Click for answer

Correct answer is 1.

First, we compute the Jacobian matrix:

\[\begin{align*} \nabla T_\text{drift}(x, p) &= \begin{bmatrix} \dfrac{\partial T_1}{\partial x} & \dfrac{\partial T_1}{\partial p}\\ \dfrac{\partial T_2}{\partial x} & \dfrac{\partial T_2}{\partial p} \end{bmatrix} \\ &= \begin{bmatrix} 1 & \epsilon \\ 0 & 1 \end{bmatrix}. \end{align*}\]

Hence, the Jacobian determinant is 1.

Kick operator

Let the steepness of the landscape at current position update the momentum a bit.
Discretization of \(p' = - \nabla U\): \[T_\text{kick}(x, p) = \left(x, p + \frac{\epsilon}{2} \nabla \log \pi(x)\right).\]

# example: a normal
# This function return gradient_x of log pi(x)
gradient = function(x) {
  -2*x # = - 0.5 x^2 / sigma^2, i.e. a normal with variance sigma^2 = 0.5
}

epsilon = 0.1

kick = function(s) {
  x = s[[1]]
  p = s[[2]]
  c(x, p + epsilon * gradient(x) / 2)
}

flip = function(s) {
  x = s[[1]]
  p = s[[2]]
  c(x, -p)
}

initial = c(1.1, 2.3) 

# flip o kick o flip o kick
#   where 'o' denotes 'function composition',
#   which recall are read right to left
newpoint = kick(initial) 
flipped = flip(newpoint)
traceback = kick(flipped)
fipped_again = flip(traceback)

initial

[1] 1.1 2.3

newpoint

[1] 1.10 2.19

flipped

[1]  1.10 -2.19

traceback

[1]  1.1 -2.3

fipped_again

[1] 1.1 2.3

Ah! Ah! we come back where we started!

I.e., the code above illustrates that \(T_\text{flip}\circ T_\text{kick}\circ T_\text{flip}\circ T_\text{kick}= I\), which is easy to show and a building block for showing that the full proposal is an involution. (And a great way to check correctness of involution-based MH algorithms!)

We have the same property for drifts:

drift = function(s) {
  x = s[[1]]
  p = s[[2]]
  c(x + epsilon * p, p)
}


initial = c(1.1, 2.3) 

# flip o kick o flip o kick
newpoint = drift(initial) 
flipped = flip(newpoint)
traceback = drift(flipped)
fipped_again = flip(traceback)

initial

[1] 1.1 2.3

newpoint

[1] 1.33 2.30

flipped

[1]  1.33 -2.30

traceback

[1]  1.1 -2.3

fipped_again

[1] 1.1 2.3

Leap frog:

Alternate between kick and drift.
Do so in a palindromic fashion to obtain involution after flip. \[T_\text{leap-frog}= T_\text{kick}\circ T_\text{drift}\circ T_\text{kick}.\]

L = 5

hmc_proposal = function(s) {
  for (i in 1:L) {
    s = kick(s)
    s = drift(s)
    s = kick(s)
  }
  flip(s)
}


newpoint = hmc_proposal(initial) 
apply_twice = hmc_proposal(newpoint)

initial

[1] 1.1 2.3

newpoint

[1]  1.8957642 -0.7389151

apply_twice

[1] 1.1 2.3

Proposition: \(T = T_\text{flip}\circ T_\text{leap-frog}^L\) is an involution, where \(T^n = T^{n-1} \circ T\).

Proof idea: use the property \(T_\text{flip}\circ T_\text{kick}\circ T_\text{flip}\circ T_\text{kick}= I\) we encountered (and similar one for the drift) and the fact that \(T_\text{leap-frog}^l\) is a palindrome (can be read both ways, like “madam” or “nurses run”).

Proposition: \(T = T_\text{flip}\circ T_\text{leap-frog}^L\) has Jacobian one.

Proof idea: show that each constituent has Jacobian determinant one, so the product of these is also one.

Putting it all together

Recall the high-level organization of HMC is the deterministic alternation of:

a complicated kernel \(K_\text{hmc}\): “rolling the ball”,
a simple kernel \(K_\text{gibbs}\): sampling the momentum.

We can now define \(K_\text{hmc}\): it it the deterministic proposal MH with proposal \(T = T_\text{flip}\circ T_\text{leap-frog}^L\).

Reference

See Neal, 2012 for an in-depth tutorial on HMC, in particular how to tune it.