Point estimates

Outline

Topics

  • Common point estimates:
    • Posterior mean.
    • Posterior mode.

Rationale

It is often necessary to summarize the posterior distribution with a single “best guess”, even though as we will see this hides important information namely our uncertainty about that guess.

Definitions

  • Let \(\pi(x) = \mathbb{P}(X = x | Y = y)\) denote a posterior PMF.
  • Point estimate: Instead of plotting the full information in \(\pi\), we can report a “location” summary such as the mean of the posterior \(\pi\).

Posterior mean

Recall the mean is computed from a PMF via \[\sum_x x\; \pi(x),\] where the sum is over \(\{x : \pi(x) > 0 \}\).

Notation: the posterior mean is denoted \(\mathbb{E}[X | Y = y] = \sum x\ \pi(x)\).

Example 1

Compute \(\mathbb{E}[X | Y = (1, 1)]\) in the bag of coin example.

  • Imagine a bag with 3 coins each with a different probability parameter \(p\)
  • Coin \(i\in \{0, 1, 2\}\) has bias \(i/2\)—in other words:
    • First coin: bias is \(0/2 = 0\) (i.e. both sides are “heads”, \(p = 0\))
    • Second coin: bias is \(1/2 = 0.5\) (i.e. standard coin, \(p = 1/2\))
    • Third coin: bias is \(2/2 = 1\) (i.e. both sides are “tails”, \(p = 1\))

  • Consider the following two steps sampling process
    • Step 1: pick one of the three coins, but do not look at it!
    • Step 2: flip the coin 4 times
  • Mathematically, this probability model can be written as follows: \[ \begin{align*} X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\ Y_i | X &\sim {\mathrm{Bern}}(X/2) \end{align*} \tag{1}\]
  1. 0.5
  2. 1.8
  3. 2.25
  4. 3.5
  5. None of the above

First, compute the unnormalized posterior \(\gamma \propto \pi\): \[\gamma = (\gamma(0), \gamma(1), \gamma(2)) = (1/3) (0^2, (1/2)^2, 1^2),\] then normalize: \[\pi = \gamma / Z = (0, 1/5, 4/5).\] Finally, compute the conditional expectation: \[\mathbb{E}[X | Y = (1, 1)] = \sum x \; \pi(x) = (0, 1, 2) \cdot (0, 1/5, 4/5) = 9/5 = 1.8.\]

Common mistake: forgetting normalization step: \[\sum x \; \pi(x) \neq \sum x \; \gamma(x).\]

  • This “common mistake” highlights that we really need \(Z\) to compute posterior expectations using the exact, exhaustive approach (i.e. the method we are using here).
  • When we talk more about Monte Carlo methods, we will see that these methods allow us to approximate expecations without having to compute \(Z\)!

Posterior mode

The mode is the location of the “tallest stick” in the PMF.

Notation: \(\operatorname{arg\,max}\pi(x),\) i.e. the point that achieves the maximum of \(\pi\).

In the Bayesian context, the mode of a posterior PMF is also known as the Maximum A Posteriori (MAP) estimator.

Example 2

You will practice computing the posterior mean/mode in question 2 of the exercises.