Exercise 4: the joy of probabilistic inference

Make sure to read the simPPLe setup page before you begin.

Q.1: logistic rocket improvement

Consider the Ariane 1 data we used this week,

success_indicators = c(1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1)

and the model described in the same page.

Recall that we discussed a model where the reliability of the rocket changes in time. This will allow us to incorporate, for example, the fact that engineering teams implement fixes based on past launches and therefore the probability of success should increase.

Write a function called logistic_regression containing a simPPLe probabilistic programming description of the model described in class. Your function should return a vector containing 3 elements in the following order:
1. the intercept (\(\in \mathbb{R}\)),
2. the slope (\(\in \mathbb{R}\)),
3. a prediction if one more launch would have been successful (1) or a failure (0) (\(\in \{0, 1\}\)).
Follow the instructions in the appendix below to get some helper functions. Use these functions to reproduce the lecture’s bivariate posterior plot over the intercept and slope parameters.
Estimate the probability that the next launch is a success given the data under the logistic model.
Create a variant of the same model but where the slope is set to zero. Estimate the probability that the next launch is a success given the data under this simplified model.

Q.2: choosing a model

You debate with your friend whether the logistic model or the simplified model (with slope equals to zero) should be preferred. To stop that debate, write a unified model which gives probability 1/2 to the simplified model, and 1/2 to the logistic model. Estimate the posterior probability that the logistic model is preferred under the unified model given the same data as in Q.1.

Appendix

For Q.1.2, you will need to copy the following code and paste it in a file called simple_utils.R. We have demonstrated the use of these functions in the lectures.

simple_utils.R


posterior_particles = function(ppl_function, number_of_iterations) {
  weight <<- 1.0
  sample = ppl_function()
  dimension = length(sample) 
  samples = matrix(0, nrow = number_of_iterations, ncol = dimension, 
                   dimnames = list(1:number_of_iterations, names(sample)))
  weights = rep(0, number_of_iterations)
  for (i in 1:number_of_iterations) {
    weight <<- 1.0       # reset the weight accumulator
    sample = ppl_function()
    samples[i,] = sample
    weights[i]  = weight
  }
  return(list(samples=samples, weights=weights))
}

ess = function(particles){
  w = particles$weights
  return(effective_sample_size(w))
}

effective_sample_size = function(w){
  (sum(w)^2)/sum(w^2)
}

representative_sample = function(snis_output, percentile=0.9999){
  ess = effective_sample_size(snis_output$weights)
  idx_ordered_weights = order(snis_output$weights, decreasing = TRUE)
  acc_norm_weights = cumsum(snis_output$weights[idx_ordered_weights])/sum(snis_output$weights)
  reduced_sample_size = max(round(ess), max(which(acc_norm_weights < percentile)))
  idx_subset = idx_ordered_weights[1:reduced_sample_size]
  list(samples = snis_output$samples[idx_subset,], weights = snis_output$weights[idx_subset])
}
weighted_scatter_plot = function(
    snis_output,
    plot_idxs = 1:2,
    base_color_hex = hcl.colors(1, palette = "viridis"),
    plot_options = list(xlab="Param 1", ylab="Param 2")
){
  base_color = col2rgb(base_color_hex)/255
  
  # find the subset with almost all the mass
  snis_subset = representative_sample(snis_output)
  
  # linear transform of weights to [0,1]
  extreme_weights = range(snis_subset$weights, na.rm = T)
  alphas = (snis_subset$weights-extreme_weights[1])/diff(extreme_weights)
  
  # create colors with transparencies and plot
  points_color_alphas = rgb(base_color[1],base_color[2],base_color[3], alphas)
  call_args=c(
    list(x=snis_subset$samples[, plot_idxs], col=points_color_alphas), 
    plot_options
  )
  do.call(plot, call_args)
}