Why HMC?

Outline

Topics

  • Motivation behind Hamiltonian Monte Carlo (HMC)
  • Computational cost of MCMC methods
  • Computational cost scaling for popular Monte Carlo methods on normal targets

Rationale

We have used Stan heavily in the second half of the course. Stan uses an algorithm called NUTS which is based on HMC.

We will now spend time understanding HMC. This will illustrate two important ideas used to construct advanced MCMC methods (involutions, augmentation).

We start by motivating HMC.

Notion of computational cost

We first have to define what we mean by an MCMC algorithm being less “costly” than another one.

Question: is the time to compute one iteration (in e.g., in seconds or FLOPS) a reasonable notion of computational cost for MCMC algorithms?

  • No, it is incomplete.
  • For example, according to this incomplete metric, an algorithm with proposal \(q(x' | x) = \mathbb{1}[x = x']\), would be really fast (can skip the MH ratio calculation!)…
    • … but is useless because its effective sample size will be 1 no matter how long you run it.
  • What you care about: how long does it take to get to a certain Monte Carlo standard error.
    • i.e.: seconds per effective sample (seconds/ESS; lower is better).

Scaling of Monte Carlo methods as a function of dimensionality

Terminology: to differentiate HMC from “the basic MH algorithm we saw earlier”, the latter is often called random walk MH (to avoid confusion since HMC is just an MH algorithm with a fancy proposal).

Proposition: (Beskos et al., 2010) For \(d\)-dimensional i.i.d. normal targets,

  • optimally tuned HMC’s cost (seconds/ESS) scales in \(O(d^{5/4})\), whereas,
  • optimally tuned random walk MH’ cost (seconds/ESS) scales in \(O(d^2)\).

Notes:

  • This means as \(d\) increases, there is an increasing performance gap in favour of HMC.
  • We would not use HMC for normal targets…