Installing and running Stan

Outline

Topics

  • What is Stan?
  • Links to install.
  • Peak of next week’s exercise first question (running Stan on your laptop).

Rationale

SNIS and simPPLe is very flexible and relatively easy to understand, but it can be very slow as we experienced in the exercise on hierarchical models.

Stan is an alternative way to approximate posterior distributions, with complementary properties:

SNIS Stan
Speed Slow Faster1
Flexibility Very flexible Less flexible2
Easy to understand? Simple More complex3

From a pedagogical point of view, it is useful to first learn about SNIS, however for real-world models, one has to use Stan, or some other advanced inference method, due to the poor scalability of SNIS.

What is Stan?

Installing Stan

You will need Stan installed to complete next week’s exercise. Don’t wait until next week, install it today!

There are two main steps to install Stan:

  1. Configuring a C++ Toolchains
  2. Installing RStan

Let us know on Piazza if you encounter any issues! This week, our priority will be resolving Stan installation issues. Next week, our priority will be replying questions about the material.

Running Stan

We present two methods for running Stan in the next two section: either from an R script, or from a notebook.

Template: to quickly get started, download the following templates which you can use as a starting point for either R script or notebook.

From a R script

First, copy and paste the following code into a file called beta_binomial.stan:

beta_binomial.stan
data {
  int<lower=0> n;         // number of trials
  int<lower=0,upper=n> k; // number of successes
}

parameters {
  real<lower=0,upper=1> p;
}

model {
  // prior
  p ~ beta(1,1);

  // likelihood
  k ~ binomial(n, p);
}

Second, run Stan as follows:

require(rstan)

fit = stan(
  seed = 123,
  file = "beta_binomial.stan",  # Stan program
  data = list(n=3, k=3),        # named list of data
  iter = 1000                   # number of samples to draw
)

The first question of next week’s exercise will be to report the posterior median, which can be obtained under the column “50%” of the output of the following R command:

print(fit)

From a notebook

To see an example of how to integrate Stan code inside quarto (R markdown would work the same), see the source code used for the next page in the notes, available on github.

Footnotes

  1. For example, for the model used in the exercise on hierarchical models, Stan can extract 1000 effective samples in less than a second, whereas doing so with SNIS/simPPLe required several minutes!↩︎

  2. For example, Stan does not support latent integer-value random variables, whereas simPPLe does.↩︎

  3. simPPLe is a few dozen lines of codes, whereas Stan has millions of lines of code.↩︎