Bayes rule

Outline

Topics

Bayes rule for discrete models
Visual intuition

Rationale

First example of computing a posterior distribution, a key concept in Bayesian statistics.

Running example

Imagine a bag with 3 coins each with a different probability parameter \(p\)
Coin \(i\in \{0, 1, 2\}\) has bias \(i/2\)—in other words:
- First coin: bias is \(0/2 = 0\) (i.e. both sides are “heads”, \(p = 0\))
- Second coin: bias is \(1/2 = 0.5\) (i.e. standard coin, \(p = 1/2\))
- Third coin: bias is \(2/2 = 1\) (i.e. both sides are “tails”, \(p = 1\))

Consider the following two steps sampling process
- Step 1: pick one of the three coins, but do not look at it!
- Step 2: flip the coin 4 times
Mathematically, this probability model can be written as follows: \[ \begin{align*} X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\ Y_i | X &\sim {\mathrm{Bern}}(X/2) \end{align*} \tag{1}\]

Consider the second question in the first exercise:

Suppose now that you observe the outcome of the 4 coin flips, but not the type of coin that was picked. Say you observe: “heads”, “heads”, “heads”, “heads” = [0, 0, 0, 0]. Given that observation, what is the probability that you picked the standard coin (i.e., the one with \(p = 1/2\))?

Strategy

Denote the observation by \(y_{1:4} = (0, 0, 0, 0)\). In the rest of the argument we will always fix \(y\) to that value.

Attack the more general problem \(\pi(x) = \mathbb{P}(X = x | Y_{1:4} = y_{1:4})\) for all hypotheses \(x \in \{0, 1, 2\}\) instead of just the requested \(x = 1\) (corresponding to the “standard coin”).
By definition of conditioning: \[\pi(x) = \frac{\mathbb{P}(X = x, Y_{1:4} = y_{1:4})}{\mathbb{P}(Y_{1:4} = y_{1:4})}.\] Let us call the numerator \[\gamma(x) = \mathbb{P}(X = x, Y_{1:4} = y_{1:4}),\] and the denominator, \[Z = \mathbb{P}(Y_{1:4} = y_{1:4}).\]
Start by computing \(\gamma(x)\) for all \(x\). (using chain rule)
Note \(Z = \gamma(0) + \gamma(1) + \gamma(2)\) (why?).