Decision theoretic point estimation

Outline

Topics

  • Deriving a point estimate from decision theory.

Rationale

We have seen in week 2 some examples of point estimates (posterior mean, posterior mode).

These are actually special cases of decision theory with specific choices of loss functions.

This page provides a general framework to answer the question: “how to summarize a posterior distribution with one point?”

Setup

The decision theoretic setup from week 2.

Example

Assume: a square loss, \(L(a, p) = (a - p)^2\), where \(a \in A = \mathbb{R}\).

Some initial simplification: on the objective function…

\[ \begin{aligned} \delta_{\text{B}}(Y) &= \operatorname{arg\,min}\{ \mathbb{E}[L(a, X) | Y] : a \in A \} \\ &= \operatorname{arg\,min}\{ \mathbb{E}[(X - a)^2 | Y] : a \in A \} \\ &= \operatorname{arg\,min}\{ \mathbb{E}[X^2 | Y] - 2a\mathbb{E}[X | Y]] + a^2 : a \in A \} \\ &= \operatorname{arg\,min}\{ - 2a\mathbb{E}[X | Y]] + a^2 : a \in A \} \end{aligned} \]

Question: under a square loss, \(\delta_{\text{B}}\) can be simplified to…

  1. \(\int x f(x|y) \mathrm{d}x\)
  2. \(\operatorname{arg\,max}\{ f(x|y) : x \in \mathbb{R}\}\)
  3. \(\int y f(x|y) \mathrm{d}y\)
  4. \(\operatorname{arg\,max}\{ f(x|y) : y \in \mathbb{R}\}\)
  5. None of the above

Idea: think of \(\mathbb{E}[X|Y]\) as a constant that you get from the posterior. To minimize the bottom expression, take derivative with respect to \(a\), equate to zero:

\[ \begin{aligned} -2 \mathbb{E}[X|Y] + 2a = 0 \end{aligned} \] Hence: here the Bayes estimator is the posterior mean, \(\delta_{\text{B}}(Y) = \mathbb{E}[X|Y] = \int x f(x|Y) \mathrm{d}x\).