Exercise 9: MCMC Hacking

Overview

In this exercise, you will implement a custom MCMC sampler for a change point detection Bayesian model.

Getting the data

Copy the blurb below into a file, you will need it in Q3.

# Source: https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
sms_data = c(
  13,24,8,24,7,35,14,11,15,11,22,22,11,57,11,19,29,6,19,
  12,22,12,18,72,32,9,7,13,19,23,27,20,6,17,13,10,14,6,
  16,15,7,2,15,15,19,70,49,7,53,22,21,31,19,11,18,20,12,
  35,17,23,17,4,2,31,30,13,27,0,39,37,5,14,13,22)

Problem formulation

We use a dataset and model inspired by Davidson-Pilon, 2013.

The data we analyze records, for each of 74 days, the number of text messages sent and received by Davidson-Pilon. The raw data looks like this:

plot(sms_data, xlab = "Day", ylab = "Number of text messages that day.")

Goal: find if this user’s behaviour changed in that time period, and if so, when.

This is a simple example of a change point detection task.

Bayesian model

Change point models are related to mixture models, but instead of each data point being free to pick which of the two clusters to belong to, change point models have a different cluster membership mechanism, defined as follows:

  1. Pick a change point \(C\) uniformly among the days \(d \in \{1, 2, \dots, 74\}\).
  2. The likelihood for days before or at the change point \(C\) uses the parameter of cluster 1.
  3. The likelihood for days after the change point \(C\) uses the parameter of cluster 2.

Mathematical description:

\[\begin{align*} \lambda_i &\sim {\mathrm{Exp}}(1/100), \text{ for }i \in \{1, 2\}, \;\;\text{(parameters for 2 clusters)} \\ C &\sim {\mathrm{Unif}}\{1, 2, \dots 74\}, \;\;\text{(change point)} \\ Y_d &\sim {\mathrm{Poisson}}(\lambda_{\mathbb{1}[d > C] + 1}), \text{ for }d \in \{1, 2, \dots, 74\} \;\;\text{(likelihood)}. \end{align*}\]

R implementation:

We provide an implementation of the joint distribution of this model below. You will need it in this exercise.

set.seed(1)

log_joint = function(rates, change_point, y) {
  
  # Return log(0.0) if parameters are outside of the support
  if (rates[[1]] < 0 | rates[[2]] < 0 | change_point < 1 | change_point > length(y)) 
    return(-Inf)
  
  log_prior = 
    dexp(rates[[1]], 1/100, log = TRUE) + 
    dexp(rates[[2]], 1/100, log = TRUE)
  
  log_likelihood = 0.0
  for (i in 1:length(y)) {
    rate = if (i < change_point) rates[[1]] else rates[[2]]
    log_likelihood = log_likelihood + dpois(y[[i]], rate, log = TRUE)
  }
  
  return(log_prior + log_likelihood)
}

Explanation: the argument…

  • rates takes in a vector of length two, with the two components corresponding to \(\lambda_1\) and \(\lambda_2\).
  • change_point takes in an integer, corresponding to \(C\) in the mathematical description above.
  • y takes in an array of non-negative integer, corresponding to \(Y = (Y_1, Y_2, \dots, Y_{74})\).

Notice also we return the log of the joint distribution, so each term in the joint is computed in log scale using the log = TRUE option.

Q.1: A custom MCMC sampler

  1. Define mathematically an irreducible and invariant MCMC algorithm to sample from the change point model’s posterior.

Start by defining a kernel \(K_1\) modifying only the rates parameter, then a second one, \(K_2\), modifying the change_point variable. Write the proposal \(q_k\) for \(k \in \{1, 2\}\) for each kernel. Then specify a kernel \(K\) based on \(K_1\) and \(K_2\).

Note: to get good mixing in Q3, your discrete proposal should have a standard deviation greater than 5.

  1. Prove that the MCMC algorithm you defined in part 1 is \(\pi\)-invariant.

  2. Implement the MCMC algorithm you describe mathematically in R. You should follow the following template:

mcmc = function(rates, change_point, y, n_iterations) {
  change_point_trace = rep(-1, n_iterations)
  
  for (i in 1:n_iterations) {
    # TODO: implement a MCMC sampler
  }
  
  # Return:
  # - the trace of the change points (for question 1) 
  # - the rates at the last iteration (for question 2)
  return(
    list(
      change_point_trace = change_point_trace, 
      last_iteration_rates = rates
    )
  )
}

Note:

  • Make sure to use the input arguments rates and change_point as initial values of your MCMC algorithms (this will be important for Q2).
  • The function returns two pieces of information:
    • The list of samples (“trace”) for the parameter of interest, change_point (used for Q3).
    • The rate parameter at the very end of the MCMC algorithm (used for Q2).

Q.2: MCMC correctness testing

We now subject our MCMC implementation to an “exact invariance test” to validate its correctness.

  1. Start by completing the function below to perform forward simulation on the same Bayesian model as in Q1. The input argument synthetic_data_size specifies the size of the dataset to generate.
forward = function(synthetic_data_size) {
  
  # TODO: implement forward simulation
  
  return(list(
    rates = rates,
    change_point = change_point,
    data = data
  ))
}

Recall that each iteration of the exact invariance test starts by forward sampling, followed by either 0 or several rounds of MCMC (we will use 200). We provide this function to do this for you:

forward_posterior = function(synthetic_data_size, n_mcmc_iters) {
  initial = forward(synthetic_data_size)
  
  if (n_mcmc_iters > 0) {
    samples = mcmc(initial$rates, initial$change_point, initial$data, n_mcmc_iters)
    return(samples$last_iteration_rates[[1]])
  } else {
    return(initial$rates[[1]])
  }
}
  1. Apply the test to your code, by completing the lines below.
# Note: we use synthetic datasets with only 5 observations to speed things up
forward_only = replicate(1000, forward_posterior(5, 0))
with_mcmc = replicate(1000, forward_posterior(5, 200))

# TODO: perform 2-samples t-test or Kolmogorov-Smirnov test
#       to see if forward_only and with_mcmc follow the same distribution. 

Report the p-value. Recall that if it “tiny” (e.g., less than 0.01), it suggests there may be a bug in your code.

What to do if you suspect there is a bug?

First, try to narrow down if the putative bug is in the change-point MCMC move, or in the rates MCMC move, or the forward sampling function.

To do so, temporary turn off the sampler \(K_1\), and run the test. Then do the same for \(K_2\).

If the low p value only shows up for one of the \(K_i\), that suggests the bug might be located there. If it’s on each separately, then there could be a common mistake, or the bug could be in the forward simulation (or testing) code.

  1. If you identified a bug using this method, show the part of the code that had a bug (before and after fixing), as well as the p-value (before and after). If no bugs where found, temporarily create one and report the same thing. Don’t forget to fix all bugs before moving to the next question.

Q.3: Using your sampler for data analysis

Perform 10k iteration of MCMC on the text message data.

samples = mcmc(...)

Recall that you can access the list of sampled change point using samples$change_point_trace.

  1. Create two trace plots for the change point parameter, one for the full trace, one for the subset of samples in the second half (via [5000:10000]). Comment on the mixing behaviour.

  2. Install the following package to compute the effective sample size:

install.packages("mcmcse")
require("mcmcse")

mcmcse::ess(...)

Again, report it for both the full set of samples and the subset coming from the second half.

  1. Produce a histogram from the second half of samples. Briefly comment on the results in light of the note below.

Note: this dataset comes with the additional information that “the 45th day corresponds to Christmas, and I moved away to Toronto the next month, leaving a girlfriend behind” (Davidson-Pilon, 2013).