Exercise 2: Bayesian inference first contact

Grading

Our priority with the weekly exercises is to provide timely feedback and an incentive to stay on top of the material so that lectures can be more effective.

We will select one or more questions that will be graded in more detail. For the other question(s), we will use the following particiation-centric binary scheme:

1 point if something reasonable was attempted,
0 otherwise.

Goals

Build a probability model for a concrete example.
Introduce the concept of Bayes estimators.

Setup

This exercise is centered around the following scenario:

You are consulting for a satellite operator
They are about to send a $100M satellite on a Delta 7925H rocket

Data: as of Jan 2025, Delta 7925H rockets have been launched 3 times, with 0 failed launches
- Note: Delta 7925H is not reusable, so each rocket is “copy- built” from the same blueprint
Should you recommend buying a $2M insurance policy?

Convention: use 1 for a success, 0 for a failure.

Q.1: define a Bayesian model

In order to perform inference on the unknown quantities, we must specify how they relate to the data; i.e., we need a probabilistic model. Assume that every Delta 7925H rocket has the same probability $p$ of success. For simplicity, let us assume that $p$ is allowed to take values on an evenly space grid \[ p \in \left\{\frac{k}{K}: k\in \{0,\dots,K\}\right\} \] for some fixed $K\in\mathbb{N}$. Furthermore, we have access to a collection of numbers $\rho_k\in[0,1]$ such that¹ \[ \forall k\in\{0,\dots,K\}:\ \mathbb{P}\left(p=\frac{k}{K}\right) = \rho_k. \tag{1}\]

Let $Y_i$ denote a binary variable with $Y_i=1$ encoding a success, and $Y_i=0$ a failure. We assume that, conditionally on $p$, the $Y_i$’s are independent of each other.

We will use the following prior: \[ \rho_k \propto \frac{k}{K}\left(1-\frac{k}{K}\right). \tag{2}\] From now on, use $K = 20$.

What are the unknown quantities in this scenario? And what is the data?
Write the joint distribution of this model (use the $\sim$ notation).

Q.2: posterior and point estimates

To help you answer the following questions, create the two vectors:

prior_probabilities where entry $i$ containing the prior probability $\rho_{i-1}$ defined in Q1 (the minus one reflects the fact that R uses indexing starting at 1), and
realizations, a vector of possible realizations of $p$ in the same order, namely $(0, 1/K, 2/K \dots, 1)$.

Plot the prior PMF. Do you think this is a reasonable prior? Hint: use the same type of plot as used last week to plot PMFs.
Let $\pi_k = \mathbb{P}(p = k/K | Y_{1:3} = (1, 1, 1))$ denote the posterior probabilities, for $k \in \{0, 1, 2, \dots, K\}$. Create a vector posterior_probabilities where entry $i$ is $\pi_{i-1}$. Plot the posterior PMF.
What is the posterior mode?
Write a function that compute the posterior mean of $p$. Hint: you should obtain $\mathbb{E}[p | Y_{1:3} = (1, 1, 1)] \approx 0.7$.

Q.3: Bayes action

Let $a\in\{0,1\}$ be a binary variable denoting the decision of buying the insurance ($a=1$) or not ($a=0$).

Based on the problem description from the Setup Section, define a loss function $L(a, y)$ that summarizes the cost of having taken decision $a\in\{0,1\}$ depending on whether the next launch is successful ($y = 1$) or not ($y = 0$). Hint: use indicator functions (i.e. binary functions taking either the value zero or one).
We now consider the expected loss under the posterior predictive distribution: \[ \mathcal{L}(a) := \mathbb{E}[L(a,Y_4)|Y_{1:3}=(1, 1, 1)] \] Write $\mathcal{L}(a)$ in terms of $\mathbb{P}\left(Y_4=1 \middle| Y_{1:3}=(1, 1, 1) \right)$. Important: you can use without proof that $\mathbb{P}\left(Y_4=1 \middle| Y_{1:3}=(1, 1, 1) \right)$ is the same as the posterior mean, which we computed earlier to be $\approx 0.7$ for our choice of prior.²
Formulate a recommendation to the owner of the satellite (again, you can use without proof that $\mathbb{P}\left(Y_4=1 \middle| Y_{1:3}=(1, 1, 1) \right) \approx 0.7$).

Footnotes

Notice that in Equation 1 we are using (small cap) $p$ as a random variable, i.e. starting to move away from the probability theory capitalization convention towards the Bayesian convention where the same capitalization is used for both the random variable and its realization, as discussed in the first week.↩︎
The proof is as follows, where here we do disambiguate between the random variable $P$ and its realization $p$ (not to be confused with $\mathbb{P}$ and PMF $p(\cdot)$!): \[\begin{align*} \mathbb{P}(Y_4 = y_4 | Y_{1:3} = \boldsymbol{1}) &= \sum_p \mathbb{P}(P = p, Y_4 = y_4 | Y_{1:3} = \boldsymbol{1}) \;\;\text{(additivity axiom)} \\ &= \sum_p \mathbb{P}(P = p | Y_{1:3} = \boldsymbol{1}) \mathbb{P}(Y_4 = y_4 | P = p, Y_{1:3} = \boldsymbol{1}) \;\;\text{(chain rule)} \\ &= \sum_p \pi(p) \mathbb{P}(Y_4 = y_4 | P = p, Y_{1:3} = \boldsymbol{1}) \;\;\text{(definition)} \\ &= \sum_p \pi(p) \mathbb{P}(Y_4 = y_4 | P = p) \;\;\text{(conditional independence)} \\ &= \sum_p \pi(p) p \;\;\text{(since each flip is assumed to be Bernoulli)} \\ &= \mathbb{E}[p | Y_{1:3} = \boldsymbol{1}]. \end{align*}\] Note however that this argument is very specific to this Bernoulli likelihood model and will not generalize. We will cover in class the general method to compute predictive distribution, see lecture notes.↩︎