Optimizing the KL
Outline
Topics
- Black-box / automatic differentiation variational inference (ADVI)
- Coordinate ascent variational inference (CAVI)
Rationale
We have now identified our objective function, the ELBO. We still need to pick a numerical method to optimize it.
Overview
- Before ~2015, the user had to do mathematical derivation each time they wanted to apply VI to a new model.
- This changed with the advent of “black box methods” such ADVI.
- In this course we focus on 2 since they are easier to use.
- However, 1 is still useful as it can be much faster in practice.
Black box methods
- Idea: use a gradient descent method to minimize \(L(\phi)\).
- Difficulty: the objective function \(L\) has an integral over \(q\). How to compute its gradient?
- Solution:
- approximate the gradient using a Monte Carlo method.
- Feed that gradient into a Stochastic Gradient Descent (SGD) algorithm.
- Convergence guarantees typically ask that this approximation be unbiased.
References
- See Blei et al., 2018.