# Point estimates

## Outline

### Topics

- Common point estimates:
- Posterior mean.
- Posterior mode.

### Rationale

It is often necessary to summarize the posterior distribution with a single “best guess”, even though as we will see this hides important information namely our uncertainty about that guess.

## Definitions

- Let \(\pi(x) = \mathbb{P}(X = x | Y = y)\) denote a posterior PMF.
**Point estimate:**Instead of plotting the full information in \(\pi\), we can report a “location” summary such as the mean of the posterior \(\pi\).

### Posterior mean

Recall the mean is computed from a PMF via \[\sum_x x\; \pi(x),\] where the sum is over \(\{x : \pi(x) > 0 \}\).

**Notation:** the posterior mean is denoted \(\mathbb{E}[X | Y = y] = \sum x\ \pi(x)\).

### Example 1

Compute \(\mathbb{E}[X | Y = (1, 1)]\) in the bag of coin example.

- Imagine a bag with 3 coins each with a different probability parameter \(p\)
- Coin \(i\in \{0, 1, 2\}\) has bias \(i/2\)—in other words:
- First coin: bias is \(0/2 = 0\) (i.e. both sides are “heads”, \(p = 0\))
- Second coin: bias is \(1/2 = 0.5\) (i.e. standard coin, \(p = 1/2\))
- Third coin: bias is \(2/2 = 1\) (i.e. both sides are “tails”, \(p = 1\))

- Consider the following two steps sampling process
- Step 1: pick one of the three coins, but do not look at it!
- Step 2: flip the coin 4 times

- Mathematically, this probability model can be written as follows: \[ \begin{align*} X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\ Y_i | X &\sim {\mathrm{Bern}}(X/2) \end{align*} \tag{1}\]

- 0.5
- 1.8
- 2.25
- 3.5
- None of the above

First, compute the unnormalized posterior \(\gamma \propto \pi\): \[\gamma = (\gamma(0), \gamma(1), \gamma(2)) = (1/3) (0^2, (1/2)^2, 1^2),\] then normalize: \[\pi = \gamma / Z = (0, 1/5, 4/5).\] Finally, compute the conditional expectation: \[\mathbb{E}[X | Y = (1, 1)] = \sum x \; \pi(x) = (0, 1, 2) \cdot (0, 1/5, 4/5) = 9/5 = 1.8.\]

**Common mistake:** forgetting normalization step: \[\sum x \; \pi(x) \neq \sum x \; \gamma(x).\]

- This “common mistake” highlights that we really need \(Z\) to compute posterior expectations using the exact, exhaustive approach (i.e. the method we are using here).
- When we talk more about Monte Carlo methods, we will see that these methods allow us to approximate expecations without having to compute \(Z\)!

### Posterior mode

The **mode** is the location of the “tallest stick” in the PMF.

**Notation:** \(\operatorname{arg\,max}\pi(x),\) i.e. the point that achieves the maximum of \(\pi\).

In the Bayesian context, the mode of a posterior PMF is also known as the Maximum A Posteriori (MAP) estimator.

### Example 2

You will practice computing the posterior mean/mode in question 2 of the exercises.