<- function() {
init_fun list(theta1 = 40, theta2 = 1.0, theta3 = 0.25)
}
Exercise 7: Introduction to MCMC
Q.1: Installing and running Stan
Report the posterior interquartile range in the mini Stan exercise from the notes.
Q.2: Regression/classification in Stan
In this question, you will analyze the sun spots data set we encountered earlier in the course but using different priors and with Stan.
First, access the data as follows:
ex07_import_sunspots.R
suppressPackageStartupMessages(require(ggplot2))
suppressPackageStartupMessages(require(dplyr))
= read.csv(
df "https://raw.githubusercontent.com/UBC-Stat-ML/web447/1e345149a5b698ccdf0a7e9b0aeabec2463c50ca/data/sunspots-SN_m_tot_V2.0.csv",
sep = ";", header=FALSE) %>%
mutate(count = ceiling(V4)) %>%
rename(year = V3) %>%
filter(year > 2005)
= df$count
counts = df$year time
Second, write a Stan model following the same structure as the model from the last time we analyzed this data except that the following priors should be used:
- \(\theta_1 \sim U(0, 200)\)
- \(\theta_2 \sim U(0,10)\)
- \(\theta_3 \sim U(0, 2\pi)\)
The likelihood is \[Y_i \mid \theta_1, \theta_2, \theta_3 \sim \text{Poisson}(\theta_1 (\sin(\theta_2 i + \theta_3) + 1.1)).\]
Notes:
You should perform regression with
counts
as the response variable andtime
as the explanatory variable.Note that inside the likelihood we use
1.1
instead of1.0
as before.You will need to initialize your Stan sampler appropriately. When calling
stan()
, specify an additional argumentinit = init_fun
, whereinit_fun()
is given below.You may need to restrict the range of your parameters in the
parameters{}
block. E.g.,real<lower=0, upper=2*pi()> theta3;
.
Here is a reasonable init_fun()
(you may copy this code):
Run Stan for 2,500 iterations and report a histogram on each quantity of interest separately (i.e., \(\theta_1\), \(\theta_2\), \(\theta_3\)), as well as a pairplot for each pair of parameters (i.e., three pairplots in total).
Q.3: A simple MCMC algorithm
In this question, you will implement a random-walk Metropolis algorithm with a standard normal proposal.
Start with the following scaffold code that encodes the posterior distribution of interest. More specifically, this encodes an un-normalized distribution \(\gamma(p)\) given by prior times likelihood.
ex07_scaffold.R
# prior: Beta(alpha, beta)
= 1
alpha = 2
beta
# observations: binomial draws
= 3
n_successes = 3
n_trials
= function(p) {
gamma_beta_binomial if (p < 0 || p > 1) return(0.0)
dbeta(p, alpha, beta) * dbinom(x = n_successes, size = n_trials, prob = p)
}
Complete the blanks in the following:
# simple Metropolis-Hastings algorithm (normal proposal)
= function(gamma, initial_point, n_iters) {
simple_mh = numeric(n_iters)
samples = length(initial_point)
dim
# TODO
return(samples)
}
Use this function to compute the posterior mean and median of the model encoded in gamma_beta_binomial
.
Compute the posterior mean and median of \(p\) using:
- 1500 iterations of MCMC,
- analytic computations (for the posterior mean only) based on the practice quiz 1 question.