```
source("../../solutions/simple.R")
source("../blocks/simple_utils.R")
suppressPackageStartupMessages(require("dplyr"))
set.seed(1)
= read.csv("../data/launches.csv") %>% filter(LV.Type == "Ariane 1")
df = df$Suc_bin
success_indicators ::paged_table(df) rmarkdown
```

# Bernoulli regression

## Outline

### Topics

- A first example of a Bayesian model based on a linear model and a Bernoulli likelihood.
- Prior construction via prior predictive distribution.
- Approximation of the posterior using a PPL.
- Visualizing a posterior distribution over functions.

### Rationale

- This is our first example of a Bayesian General Linear Model (GLM).
- GLMs are the probably the most common models.

## Example

- The Ariane 1 is an expandable rocket launched 11 times between 1979 and 1986.
- It failed 2 times and was successful the 9 other launches.
- So far our models treat the launches as iid given the success probability \(p\).
- Can we do better?

`plot(success_indicators, xlab = "Launch index i")`

- From this “Exploratory Data Analysis” (EDA), it seems plausible that the success probability is increasing with time.
- Matches with intuition: after a failure, some corrections are made.

## Building a better model

- Each observation is binary, so the likelihood still has to be Bernoulli.
- We will denote its parameter by \(\theta \in [0, 1]\).

- What we will change is the prior.
**Old model:**\(\theta\) is shared by all launches (“constant over the**launch index**” \(i \in \{1, 2, \dots, 11\}\))**New model:**\(\theta\) changes from one launch to the next.- i.e., \(\theta\) is a function of the index \(i \in \{1, 2, \dots, 11\}\), denoted \(\theta(i)\).

- Question: what kind of function should we start with?

- i.e., \(\theta\) is a function of the index \(i \in \{1, 2, \dots, 11\}\), denoted \(\theta(i)\).

Click for answer

**General modelling principle:** start with something simple!

**In our context:** start with a linear function.

### Structure of the model

- We need to build a prior over linear functions.
- Recall: enough to describe how to forward simulate a dataset.
- Forward simulation process:
- simulate an intercept,
- simulate a slope,
- this determines \(\theta(i)\) for each \(i \in \{1, 2, \dots, 11\}\).
- Simulate \(y_i \sim {\mathrm{Bern}}(\theta(i))\) independently but not identically.

- I.e. we have reduce the problem to that of sampling two real numbers.
- Reasonable prior for a first try: the normal distribution.

### Prior (first attempt)

**Let us draw one random linear function:**

- Math: (not yet final) \[\begin{align*} \text{slope} &\sim \mathcal{N}(0, 1) \\ \text{intercept} &\sim \mathcal{N}(0, 1) \\ \theta(i) &= \text{slope} \cdot i + \text{intercept} \end{align*}\]
- Forward simulation code: (not yet final)

```
set.seed(1)
plot(success_indicators, ylab = "success probability", xlab = "Launch index i")
= 1:length(success_indicators)
xs = simulate(Norm(0, 1))
intercept = simulate(Norm(0, 1))
slope
lines(intercept + slope * xs)
```

**What is the problem in the above?**

Click for answer

- The function \(\theta(i)\) can take values smaller than zero or greater than one.
- This creates problem when we feed the parameter to the Bernoulli, which expects a number between 0 and 1.

**Fix:**

- we cannot use just a linear function…
- … instead we can compose a linear function with the
**logistic**or**sigmoid**function.**Logistic function**: maps real numbers \(r \in (-\infty, \infty)\) into \((0, 1)\).- Math: \[\text{logistic}(r) = \frac{1}{1 + e^{-r}}.\]
- In R:
`plogis(r)`

.

```
= seq(-5, 5, 0.01)
rs plot(rs, plogis(rs), type = 'l', xlab = "r", ylab = "logistic(r)")
```

### Prior (second, final attempt)

- Math: \[\begin{align*} \text{slope} &\sim \mathcal{N}(0, 1) \\ \text{intercept} &\sim \mathcal{N}(0, 1) \\ \theta(i) &= \text{logistic}(\text{slope} \cdot i + \text{intercept}) \end{align*}\]
- Forward simulation code:

```
set.seed(1)
plot(success_indicators, ylab = "success probability", xlab = "Launch index i")
= 1:length(success_indicators)
xs = simulate(Norm(0, 1))
intercept = simulate(Norm(0, 1))
slope
lines(plogis(intercept + slope * xs))
```

## Prior predictive

Let us repeat what we did in the last section 50 times to see several draws from the prior at once (using `alpha`

to make the lines translucent):

```
set.seed(1)
plot(success_indicators, ylab = "success probability", xlab = "Launch index i")
= 1:length(success_indicators)
xs
for (i in 1:50) {
= simulate(Norm(0, 1))
intercept = simulate(Norm(0, 1))
slope lines(plogis(intercept + slope * xs), col = rgb(red = 0, green = 0, blue = 0, alpha = 0.5))
}
```

- Hard to get intuition about a prior by just staring at the mathematical formulas.
- Simulating from the prior can help figuring out if the prior is reasonable or not.
- This is known as the
**prior predictive.**

- This is known as the
- Exploration:
- so far I used a mean of zero and standard deviation of 1 for the slope and intercept priors.
- Let us try a prior that is more “vague”, with standard deviation of 10 for both the slope and intercept priors:

```
set.seed(1)
plot(success_indicators, ylab = "success probability", xlab = "Launch index i")
= 1:length(success_indicators)
xs
for (i in 1:50) {
= simulate(Norm(0, 10))
intercept = simulate(Norm(0, 10))
slope lines(plogis(intercept + slope * xs), col = rgb(red = 0, green = 0, blue = 0, alpha = 0.5))
}
```

## Posterior distribution

- In this week’s exercise, you will implement the model described above in simPPLe.
- Here is a peak of the posterior distribution you should obtain:

```
source("../../solutions/sol04_logistic_regression.R")
= posterior_particles(logistic_regression, 1000)
posterior weighted_scatter_plot(posterior, plot_options = list(xlab="intercept parameter", ylab="slope parameter"))
```

- Again, this is a bit hard to interpret.
- Let us plot similarly to what we did with the prior predictive:
- For each sample \(x^{(m)}= (\text{intercept}^{(m)}, \text{slope}^{(m)})\) with weight \(w^{(m)}\),
- Draw the curve \(\text{logistic}(\text{slope}^{(m)}\cdot i + \text{intercept}^{(m)})\)…
- with
`alpha`

value (transparency) proportional to the corresponding weight \(w^{(m)}\).

```
set.seed(1)
plot(success_indicators, ylab = "success probability", xlab = "Launch index i")
= 1:length(success_indicators)
xs
= posterior$samples
samples = posterior$weights / sum(posterior$weights)
norm_weights
for (i in 1:nrow(samples)) {
= samples[i, 1]
intercept = samples[i, 2]
slope = norm_weights[i]
pr lines(plogis(intercept + slope * xs), col = rgb(red = 0, green = 0, blue = 0, alpha = pr*20))
}
```

## Terminology

The model we just reviewed is an instance of Bayesian logistic regression, a method for classification.

Some terminology from classification:

- output variables: instances of which we try to “predict”
- also known as “target”, “label”, “predicted variable”, “regressand”, …
- sometimes observed (“
**training**instances”), sometimes unobserved (“**test**instances”) - in our example?

- input variables: what we use as the basis of each prediction
- also known as “independent variables”, “covariates”, “predictor”, “regressors”, “feature”,..
- typically always observed (both at training and test time)

- parameters: auxiliary quantities that encode a function mapping inputs to (information on) output.