Exercise 6: Hierarchical models

We consider the same model as in the hierarchical model lecture, except that we change the rate of the exponential from \(1/10000\) into \(1/1000\) to avoid issues specific to the Laplace approximation.1

Q.1: Change of variable

The parameters in the model are constrained: for \(p_i\), \(\mu\), they have to be in \((0, 1)\), and for \(s\), we have \(s > 0\). So if we sampled from a Laplace approximation performed on the \((p, \mu, s)\) parameterization, we would sometimes hit samples violating the constraints. In fact, the probability that the Laplace approximation’s samples give valid parameters might be vanishingly small.

To avoid this, we will perform a change of variable. Consider the following constraining transformation \(c(\cdot)\):

logistic = plogis # alias - more intuitive name for logistic fct

constrain = function(unconstrained_params) {
  # here, n coincides with the number of rows in the 
  # data frame (i.e., the number of rocket types)
  n = length(unconstrained_params) - 2
  c(
    # first n+1: transform to (0, 1)
    logistic(unconstrained_params[1:(n+1)]), 
    # last one: to (0, inf)
    exp(unconstrained_params[n+2])
  )
}

The inverse of this mapping, \(c^{-1}\), defines an unconstrained reparameterization. Determine the density of the prior under the unconstrained parameterization. In other words, if say \(S \sim {\mathrm{Exp}}(1/1000)\), what is the density of \(c^{-1}(S)\)? Do the same for the other parameters.

Note that in both the constrained and unconstrained parameterizations, we will use the same ordering of the parameters: \((p_1, \dots, p_n, \mu, s)\).

Hint: use a change of variable as used in integration or STAT 302 for transformations of random variables.

Q.2 Laplace approximation of a hierarchical model

We will use a subset of the launch data shown in class: download it here

suppressPackageStartupMessages(require("dplyr"))
df = read.csv("failure_counts.csv") %>% 
      filter(grepl("^Delta", LV.Type)) 

Notice that we have subsetted to only the rocket types with the prefix Delta in their name.

Create a function called build_objective following this template:

build_objective = function(df) {
  n = nrow(df)
  # return the objective function:
  function(unconstrained_params) {
    # compute log gamma
    # TODO
    return(log_gamma)
  }
}

Q.3 Interpretation

We provide the following code which computes the Laplace approximation, and then samples from it. Make sure you understand what the code does and run it. Interpret the results. Are the different posterior means on the \(p_i\) very different or similar? Does that make sense in the current context?

laplace = function(df) {
  n = nrow(df)
  init = numeric(n + 2) 
  sample_laplace(maximize(build_objective(df), init), 1000)
}

maximize = function(objective, start) {
  res = nlm(f = function(x) -objective(x), start, hessian = TRUE) 
  # the negative signs here is because nlm minimizes, so we negated the objective, 
  # and we convert back below...
  list(
    max = -res$minimum,
    argmax = res$estimate, 
    hessian = -res$hessian
  )
}

sample_laplace = function(opt, M) {
  mu = opt$argmax
  k  = length(mu)
  
  # hint: to understand these steps, compare to the univariate case
  sigma = solve(-opt$hessian)
  L = t(chol(sigma)) # <- this is like taking the square root

  constrained_samples = matrix(NA, nrow = M, ncol = k)
  
  for (i in 1:M) {
    z = rnorm(k)
    # again, use the univariate case to understand this:
    unconstrained_sample = mu + (L %*% z)
    constrained_samples[i, ] = constrain(unconstrained_sample)
  }
  
  return(constrained_samples)
}

s = laplace(df)
colMeans(s) # report the mean over M samples for each parameter

Q.4 Adding an outlier

We now add an artificial outlier and rerun the inference. Comment on the new results. Do they differ from the previous question? Does that make sense?

df2 = rbind(df,
            data.frame(
              numberOfFailures = 100,
              numberOfLaunches = 100,
              LV.Type = "Fake-Delta"
            ))

s = laplace(df2)
colMeans(s)

Footnotes

  1. More specifically, for the choice \(1/10000\), the Hessian is not a definite matrix. This means that we would need MCMC to approximate the original model.↩︎