Stan: hands on

Outline

We revisit the time-varying Ariane 1 rocket failure probability model to practice the Stan syntax introduced this week.

suppressPackageStartupMessages(require(rstan))
suppressPackageStartupMessages(require(ggplot2))
suppressPackageStartupMessages(require(dplyr))

set.seed(1)

df = read.csv(url("https://raw.githubusercontent.com/UBC-Stat-ML/web447/main/data/launches.csv")) %>% filter(LV.Type == "Ariane 1")
success_indicators = df$Suc_bin
rmarkdown::paged_table(df)

plot(success_indicators, xlab = "Launch index i")

Model

Recall the model we discussed previously (with the difference that we use a SD of 10 now)

\[\begin{align*} \text{slope} &\sim \mathcal{N}(0, 10) \\ \text{intercept} &\sim \mathcal{N}(0, 10) \\ \theta(i) &= \text{logistic}(\text{slope} \cdot i + \text{intercept}) \\ y_i &\sim {\mathrm{Bern}}(\theta(i)) \end{align*}\]

…which you also implemented in simPPLe as part of exercise 4:

logistic_regression = function() {
  intercept = simulate(Norm(0, 10))
  slope     = simulate(Norm(0, 10))
  for (i in seq_along(success_indicators)){
    success_probability = plogis(intercept + i*slope)
    observe(success_indicators[i], Bern(success_probability))
  }
  return(c(intercept, slope))
}

Translation into Stan

Question 1: use the template below to translate the above model into Stan. Set the seed to 1, run 10000 MCMC iterations, and report the posterior mean of the slope parameter.

data { 
  int N; 
  array[N] int y; 
}

parameters { 
  real slope;
  real intercept;
}

model {
  // complete this
}

Click for choices

The value closest to the posterior mean of the slope parameter is…

0.6
0.3
0.1
-0.7
None of the above

Click for answer


data { 
  int N; 
  array[N] int y; 
}

parameters { 
  real slope;
  real intercept;
}

model {
  slope ~ normal(0, 10);
  intercept ~ normal(0, 10);
  for (i in 1:N) {
    y[i] ~ bernoulli(inv_logit(intercept + slope * i));
  }
}

fit = sampling(
  logistic, 
  data = list( 
          y = success_indicators, 
          N = length(success_indicators)
        ), 
  chains = 1,
  iter = 10000       
)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.046 seconds (Warm-up)
Chain 1:                0.046 seconds (Sampling)
Chain 1:                0.092 seconds (Total)
Chain 1:

fit

Inference for Stan model: anon_model.
1 chains, each with iter=10000; warmup=5000; thin=1; 
post-warmup draws per chain=5000, total post-warmup draws=5000.

           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
slope      0.58    0.01 0.44 -0.12  0.26  0.52  0.82  1.61  1009    1
intercept -0.71    0.06 1.89 -4.46 -1.90 -0.64  0.55  2.86  1056    1
lp__      -5.59    0.03 1.19 -8.85 -6.07 -5.23 -4.74 -4.41  1195    1

Samples were drawn using NUTS(diag_e) at Thu Mar  7 11:04:48 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Hence the correct choice was 0.6.

Using the posterior

Question 2: compute a 95% credible interval on the slope parameter.

Hint

Use the output of print on the fit object.

Click for choices

\([-4.5, 2.9]\)
\([-0.3, 1.4]\)
\([-0.1, 1.6]\)
\([0.2, 0.9]\)
None of the above

Click for answer

Using again the print out of the fit object:

fit

Inference for Stan model: anon_model.
1 chains, each with iter=10000; warmup=5000; thin=1; 
post-warmup draws per chain=5000, total post-warmup draws=5000.

           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
slope      0.58    0.01 0.44 -0.12  0.26  0.52  0.82  1.61  1009    1
intercept -0.71    0.06 1.89 -4.46 -1.90 -0.64  0.55  2.86  1056    1
lp__      -5.59    0.03 1.19 -8.85 -6.07 -5.23 -4.74 -4.41  1195    1

Samples were drawn using NUTS(diag_e) at Thu Mar  7 11:04:48 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

We can get a quantile-based credible interval by looking at the columns 2.5% and 97.5% (i.e., chopping off 5%/2 on each side): this yields \([-0.12, 1.61]\).

Question 3: translate “your updated belief that the Ariane 1 rockets were improving” into a mathematical expression.

Click for choices

\(\mathbb{E}[\text{slope}| Y = y] > 0\)
\(\mathbb{E}[\text{slope}] > 0\)
\(\mathbb{P}(\text{slope} > 0 | Y = y)\)
\(\mathbb{P}(\text{slope} > 0)\)
None of the above

Click for answer

“Ariane 1 rockets were improving” translates to the event \((\text{slope} > 0)\). Hence our “updated belief,” which is just the posterior probability, is \(\mathbb{P}(\text{slope} > 0 | Y = y)\).

Question 4: estimate the numerical value of the expression in the last question.

Click for choices

0.65
0.75
0.85
0.95
None of the above

Click for answer

Here we want to add a new random variable to our model, defined as: \[\text{slope\_is\_positive} = \mathbb{1}[\text{slope} > 0].\] Why? Because then the posterior mean of the indicator will yield the required probability: \[\mathbb{E}[\mathbb{1}[\text{slope} > 0]|Y = y] = \mathbb{P}(\text{slope} > 0 | Y = y),\] and we will be able to read off that mean from the print out of the fit object.

Now, how to add the slope_is_positive random variable to our Stan model?

Since the random variable is defined using “=” rather than “\(\sim\)”, we can do this using a transformed parameters, introduced earlier this week.

Here though notice that is_slope_positive is not used within the model block, in such case, an alternative is to use a generated quantities, which we demonstrate here:


data { 
  int N; 
  array[N] int y; 
}

parameters { 
  real slope;
  real intercept;
}

model {
  slope ~ normal(0, 10);
  intercept ~ normal(0, 10);
  for (i in 1:N) {
    y[i] ~ bernoulli(inv_logit(intercept + slope * i));
  }
}

generated quantities {
  real is_slope_positive = slope > 0 ? 1 : 0;
}

fit = sampling(
  logistic_with_generated, 
  data = list( 
          y = success_indicators, 
          N = length(success_indicators)
        ), 
  chains = 1,
  iter = 10000       
)


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.047 seconds (Warm-up)
Chain 1:                0.049 seconds (Sampling)
Chain 1:                0.096 seconds (Total)
Chain 1:

fit

Inference for Stan model: anon_model.
1 chains, each with iter=10000; warmup=5000; thin=1; 
post-warmup draws per chain=5000, total post-warmup draws=5000.

                   mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
slope              0.57    0.01 0.41 -0.11  0.30  0.53  0.81  1.51  1216    1
intercept         -0.74    0.05 1.81 -4.30 -1.87 -0.72  0.40  2.74  1314    1
is_slope_positive  0.95    0.01 0.23  0.00  1.00  1.00  1.00  1.00  1535    1
lp__              -5.51    0.03 1.12 -8.47 -5.95 -5.21 -4.71 -4.41  1241    1

Samples were drawn using NUTS(diag_e) at Thu Mar  7 11:05:05 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Hence the correct answer is 0.95, the posterior mean of is_slope_positive.