Installing and running Stan
Outline
Topics
- What is Stan?
- Links to install.
- Peak of next week’s exercise first question (running Stan on your laptop).
Rationale
SNIS and simPPLe is very flexible and relatively easy to understand, but it can be very slow as we experienced in the exercise on hierarchical models.
Stan is an alternative way to approximate posterior distributions, with complementary properties:
SNIS | Stan | |
---|---|---|
Speed | Slow | Faster1 |
Flexibility | Very flexible | Less flexible2 |
Easy to understand? | Simple | More complex3 |
From a pedagogical point of view, it is useful to first learn about SNIS, however for real-world models, one has to use Stan, or some other advanced inference method, due to the poor scalability of SNIS.
What is Stan?
- Stan is the most popular PPL as of 2024.
- Review the notes on “what is a PPL.”
- Stan uses Markov chain Monte Carlo (MCMC) to approximate the posterior distribution.
- Think of MCMC has a drop-in replacement for SNIS.
- We will talk about it in more detail soon.
Installing Stan
You will need Stan installed to complete next week’s exercise. Don’t wait until next week, install it today!
There are two main steps to install Stan:
Let us know on Piazza if you encounter any issues! This week, our priority will be resolving Stan installation issues. Next week, our priority will be replying questions about the material.
Running Stan
We present two methods for running Stan in the next two section: either from an R script, or from a notebook.
Template: to quickly get started, download the following templates which you can use as a starting point for either R script or notebook.
From a R script
First, copy and paste the following code into a file called beta_binomial.stan
:
beta_binomial.stan
data {
int<lower=0> n; // number of trials
int<lower=0,upper=n> k; // number of successes
}
parameters {
real<lower=0,upper=1> p;
}
model {
// prior
1,1);
p ~ beta(
// likelihood
k ~ binomial(n, p); }
Second, run Stan as follows:
require(rstan)
= stan(
fit seed = 123,
file = "beta_binomial.stan", # Stan program
data = list(n=3, k=3), # named list of data
iter = 1000 # number of samples to draw
)
The first question of next week’s exercise will be to report the posterior median, which can be obtained under the column “50%” of the output of the following R command:
print(fit)
From a notebook
To see an example of how to integrate Stan code inside quarto (R markdown would work the same), see the source code used for the next page in the notes, available on github.
Footnotes
For example, for the model used in the exercise on hierarchical models, Stan can extract 1000 effective samples in less than a second, whereas doing so with SNIS/simPPLe required several minutes!↩︎
For example, Stan does not support latent integer-value random variables, whereas simPPLe does.↩︎
simPPLe is a few dozen lines of codes, whereas Stan has millions of lines of code.↩︎