Installing and running Stan

Outline

Topics

What is Stan?
Links to install.
Peak of next week’s exercise first question (running Stan on your laptop).

Rationale

SNIS and simPPLe is very flexible and relatively easy to understand, but it can be very slow as we experienced in the exercise on hierarchical models.

Stan is an alternative way to approximate posterior distributions, with complementary properties:

	SNIS	Stan
Speed	Slow	Faster¹
Flexibility	Very flexible	Less flexible²
Easy to understand?	Simple	More complex³

From a pedagogical point of view, it is useful to first learn about SNIS, however for real-world models, one has to use Stan, or some other advanced inference method, due to the poor scalability of SNIS.

What is Stan?

Stan is the most popular PPL as of 2024.
Review the notes on “what is a PPL.”
Stan uses Markov chain Monte Carlo (MCMC) to approximate the posterior distribution.
- Think of MCMC has a drop-in replacement for SNIS.
- We will talk about it in more detail soon.

Installing Stan

You will need Stan installed to complete next week’s exercise. Don’t wait until next week, install it today!

There are two main steps to install Stan:

Let us know on Piazza if you encounter any issues! This week, our priority will be resolving Stan installation issues. Next week, our priority will be replying questions about the material.

Running Stan

We present two methods for running Stan in the next two section: either from an R script, or from a notebook.

Template: to quickly get started, download the following templates which you can use as a starting point for either R script or notebook.

From a R script

First, copy and paste the following code into a file called beta_binomial.stan:

beta_binomial.stan

data {
  int<lower=0> n;         // number of trials
  int<lower=0,upper=n> k; // number of successes
}

parameters {
  real<lower=0,upper=1> p;
}

model {
  // prior
  p ~ beta(1,1);

  // likelihood
  k ~ binomial(n, p);
}

Second, run Stan as follows:

require(rstan)

fit = stan(
  seed = 123,
  file = "beta_binomial.stan",  # Stan program
  data = list(n=3, k=3),        # named list of data
  iter = 1000                   # number of samples to draw
)

The first question of next week’s exercise will be to report the posterior median, which can be obtained under the column “50%” of the output of the following R command:

print(fit)

From a notebook

To see an example of how to integrate Stan code inside quarto (R markdown would work the same), see the source code used for the next page in the notes, available on github.

Footnotes

For example, for the model used in the exercise on hierarchical models, Stan can extract 1000 effective samples in less than a second, whereas doing so with SNIS/simPPLe required several minutes!↩︎
For example, Stan does not support latent integer-value random variables, whereas simPPLe does.↩︎
simPPLe is a few dozen lines of codes, whereas Stan has millions of lines of code.↩︎