Decision theoretic point estimation
Outline
Topics
- Deriving a point estimate from decision theory.
Rationale
We have seen in week 2 some examples of point estimates (posterior mean, posterior mode).
These are actually special cases of decision theory with specific choices of loss functions.
This page provides a general framework to answer the question: “how to summarize a posterior distribution with one point?”
Setup
Example
Assume: a square loss, \(L(a, p) = (a - p)^2\), where \(a \in A = \mathbb{R}\).
Some initial simplification: on the objective function…
\[ \begin{aligned} \delta_{\text{B}}(Y) &= \operatorname{arg\,min}\{ \mathbb{E}[L(a, X) | Y] : a \in A \} \\ &= \operatorname{arg\,min}\{ \mathbb{E}[(X - a)^2 | Y] : a \in A \} \\ &= \operatorname{arg\,min}\{ \mathbb{E}[X^2 | Y] - 2a\mathbb{E}[X | Y]] + a^2 : a \in A \} \\ &= \operatorname{arg\,min}\{ - 2a\mathbb{E}[X | Y]] + a^2 : a \in A \} \end{aligned} \]
Question: under a square loss, \(\delta_{\text{B}}\) can be simplified to…
- \(\int x f(x|y) \mathrm{d}x\)
- \(\operatorname{arg\,max}\{ f(x|y) : x \in \mathbb{R}\}\)
- \(\int y f(x|y) \mathrm{d}y\)
- \(\operatorname{arg\,max}\{ f(x|y) : y \in \mathbb{R}\}\)
- None of the above
Idea: think of \(\mathbb{E}[X|Y]\) as a constant that you get from the posterior. To minimize the bottom expression, take derivative with respect to \(a\), equate to zero:
\[ \begin{aligned} -2 \mathbb{E}[X|Y] + 2a = 0 \end{aligned} \] Hence: here the Bayes estimator is the posterior mean, \(\delta_{\text{B}}(Y) = \mathbb{E}[X|Y] = \int x f(x|Y) \mathrm{d}x\).