Autodiff
Outline
Topics
- Reverse mode automatic differentiation (autodiff)
- Using Stan as a stand-alone autodiff system
Rationale
As we just saw, HMC requires computing the gradient (vector of partial derivatives) at each iteration. But notice we did not have to manually derive the gradient when writing Stan models.
Why? Thanks to autodiff!
Automatic differentiation
- A Stan program specifies \(\log \gamma(x)\), a function from \(\mathbb{R}^d\) to \(\mathbb{R}\).
- Stan knows how to differentiate “primitive operations” such as \(+, \cdots, \exp, \dots\)
- Stan knows how to recurse based on the chain rule: \((f\circ g)' = (f' \circ g) g'\).
Reverse-mode autodiff: see wikipedia for details; in summary,
- Stan uses known derivatives for primitive operations, combined with chain rule to compute \(\nabla \log \gamma\) automatically.
- The “reverse-mode” variant of autodiff allows us to do that computation with the same running time as computing \(\log \gamma\).1
Using Stan as a stand-alone autodiff system
If you just need just the gradient of the log density of your Stan model and not HMC (e.g., to develop new inference algorithms), use BridgeStan (bridges Stan with R, Julia, Rust, etc).
Footnotes
In contrast, both forward mode autodiff and numerical differentiation would be \(d\) time slower at computing \(\nabla \log \gamma\) compared to \(\log \gamma\).↩︎