Why HMC?

Outline

Topics

Motivation behind Hamiltonian Monte Carlo (HMC)
Computational cost of MCMC methods
Computational cost scaling for popular Monte Carlo methods on normal targets

Rationale

We have used Stan heavily in the second half of the course. Stan uses an algorithm called NUTS which is based on HMC.

We will now spend time understanding HMC. This will illustrate two important ideas used to construct advanced MCMC methods (involutions, augmentation).

We start by motivating HMC.

Why is HMC popular: the short answer

For many models, HMC is less computationally costly than the basic MH algorithm we saw earlier.

Notion of computational cost

We first have to define what we mean by an MCMC algorithm being less “costly” than another one.

Question: is the time to compute one iteration (in e.g., in seconds or FLOPS) a reasonable notion of computational cost for MCMC algorithms?

Click for answer

No, it is incomplete.
For example, according to this incomplete metric, an algorithm with proposal \(q(x' | x) = \mathbb{1}[x = x']\), would be really fast (can skip the MH ratio calculation!)…
- … but is useless because its effective sample size will be 1 no matter how long you run it.
What we care about: how long does it take to get to a certain Monte Carlo standard error.
- i.e.: seconds per effective sample (seconds/ESS; lower is better).

Scaling of Monte Carlo methods as a function of dimensionality

Terminology: to differentiate HMC from “the basic MH algorithm we saw earlier”, the latter is often called random walk MH (to avoid confusion since HMC is just an MH algorithm with a fancy proposal).

Proposition: (Beskos et al., 2010) For \(d\)-dimensional i.i.d. normal targets,

optimally tuned HMC’s cost (seconds/ESS) scales in \(O(d^{5/4})\), whereas,
optimally tuned random walk MH’ cost (seconds/ESS) scales in \(O(d^2)\).

Notes:

This means as \(d\) increases, there is an increasing performance gap in favour of HMC.
We would not use HMC for normal targets…
- …but many real models are approximately normal,
- so this result is a useful rule of thumb.