Exercise 4: the joy of probabilistic inference
Make sure to read the simPPLe setup page before you begin.
Q.1: logistic rocket improvement
Consider the Ariane 1 data we used this week,
= c(1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1) success_indicators
and the model described in the same page.
Recall that we discussed a model where the reliability of the rocket changes in time. This will allow us to incorporate, for example, the fact that engineering teams implement fixes based on past launches and therefore the probability of success should increase.
- Write a function called
logistic_regression
containing a simPPLe probabilistic programming description of the model described in class. Your function should return a vector containing 3 elements in the following order:- the intercept (\(\in \mathbb{R}\)),
- the slope (\(\in \mathbb{R}\)),
- a prediction if one more launch would have been successful (1) or a failure (0) (\(\in \{0, 1\}\)).
- Follow the instructions in the appendix below to get some helper functions. Use these functions to reproduce the lecture’s bivariate posterior plot over the intercept and slope parameters.
- Estimate the probability that the next launch is a success given the data under the logistic model.
- Create a variant of the same model but where the slope is set to zero. Estimate the probability that the next launch is a success given the data under this simplified model.
Q.2: choosing a model
You debate with your friend whether the logistic model or the simplified model (with slope equals to zero) should be preferred. To stop that debate, write a unified model which gives probability 1/2 to the simplified model, and 1/2 to the logistic model. Estimate the posterior probability that the logistic model is preferred under the unified model given the same data as in Q.1.
Appendix
For Q.1.2, you will need to copy the following code and paste it in a file called simple_utils.R
. We have demonstrated the use of these functions in the lectures.
simple_utils.R
= function(ppl_function, number_of_iterations) {
posterior_particles <<- 1.0
weight = ppl_function()
sample = length(sample)
dimension = matrix(0, nrow = number_of_iterations, ncol = dimension,
samples dimnames = list(1:number_of_iterations, names(sample)))
= rep(0, number_of_iterations)
weights for (i in 1:number_of_iterations) {
<<- 1.0 # reset the weight accumulator
weight = ppl_function()
sample = sample
samples[i,] = weight
weights[i]
}return(list(samples=samples, weights=weights))
}
= function(particles){
ess = particles$weights
w return(effective_sample_size(w))
}
= function(w){
effective_sample_size sum(w)^2)/sum(w^2)
(
}
= function(snis_output, percentile=0.9999){
representative_sample = effective_sample_size(snis_output$weights)
ess = order(snis_output$weights, decreasing = TRUE)
idx_ordered_weights = cumsum(snis_output$weights[idx_ordered_weights])/sum(snis_output$weights)
acc_norm_weights = max(round(ess), max(which(acc_norm_weights < percentile)))
reduced_sample_size = idx_ordered_weights[1:reduced_sample_size]
idx_subset list(samples = snis_output$samples[idx_subset,], weights = snis_output$weights[idx_subset])
}= function(
weighted_scatter_plot
snis_output,plot_idxs = 1:2,
base_color_hex = hcl.colors(1, palette = "viridis"),
plot_options = list(xlab="Param 1", ylab="Param 2")
){= col2rgb(base_color_hex)/255
base_color
# find the subset with almost all the mass
= representative_sample(snis_output)
snis_subset
# linear transform of weights to [0,1]
= range(snis_subset$weights, na.rm = T)
extreme_weights = (snis_subset$weights-extreme_weights[1])/diff(extreme_weights)
alphas
# create colors with transparencies and plot
= rgb(base_color[1],base_color[2],base_color[3], alphas)
points_color_alphas =c(
call_argslist(x=snis_subset$samples[, plot_idxs], col=points_color_alphas),
plot_options
)do.call(plot, call_args)
}