Installing and running Stan

Outline

Topics

  • What is Stan?
  • Installing Stan.

Rationale

SNIS and simPPLe are very flexible and relatively easy to understand, but they can be very slow.

Stan is an alternative way to approximate posterior distributions, with complementary properties:

SNIS Stan
Speed Slow Faster1
Flexibility Very flexible Less flexible2
Easy to understand? Simple More complex3

From a pedagogical point of view, it is useful to first learn about SNIS, however for real-world models, one would typically use Stan, or some other advanced inference method, due to the poor scalability of SNIS.

What is Stan?

Installing Stan and CmdStanR4

You will need Stan installed to complete the exercises and clicker questions after the quiz. Don’t wait until the last minute, install it today!

Go to the Stan instruction page and make sure to select CmdStanR.

Note: the documentation says that you should use

remotes::install_github("stan-dev/cmdstanr")

however, this will only work if you have the R package remotes installed. You can instead just use:

install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))

Let us know on Piazza if you encounter any issues!

Running Stan via CmdStanR

First, copy and paste the following code into a file called beta_binomial.stan:

beta_binomial.stan
data {
  int<lower=0> n;         // number of trials
  int<lower=0,upper=n> k; // number of successes
}

parameters {
  real<lower=0,upper=1> p;
}

model {
  // prior
  p ~ beta(1,1);

  // likelihood
  k ~ binomial(n, p);
}

Second, run Stan as follows:

suppressPackageStartupMessages(require(cmdstanr))
mod = cmdstan_model("beta_binomial.stan")

# create a directory where the samples will be saved 
dir.create(file.path("stan_out"), showWarnings = FALSE)

fit = mod$sample(
  seed = 1,
  chains = 1,
  refresh = 500, # how often to report iteration progress
  output_dir = "stan_out", # where the samples will be saved
  data = list(n=3, k=3) # named list of data
)
Running MCMC with 1 chain...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.0 seconds.

The first question of the next exercise will be to report the posterior median, which can be obtained under the column “50%” of the output of the following R command:

print(fit)

Old method: RStan

For reference, we archive here the old instructions for RStan. They may still work on some platforms.

Footnotes

  1. Often MCMC algorithms such as stan scale polynomially in dimension while SNIS scales exponentially.↩︎

  2. For example, Stan does not support latent integer-value random variables, whereas simPPLe does.↩︎

  3. simPPLe is a few dozen lines of code, whereas Stan has millions of lines of code.↩︎

  4. Older versions of these instructions were using RStan instead of cmdstanr. Unfortunately, the former seems to have abruptly stopped working (at least on my machine) circa MacOS 15.7.4, and development seems to have migrated from RStan to cmdstanr. As a result, we are currently migrating from RStan into cmdstanr.↩︎