Decision theory

Outline

Topics

  • Decision theory: making decision in the face of uncertainty.
    • First, without observed data.
    • Second, incorporating observed data.
  • Review of property of expectation useful for decision theory calculations.

Rationale

We talked a lot about the “Bayesian recipe.” Here we formalize the last step of that recipe: using a posterior and a loss function to make a decision.

Later, we will see that the point estimates and credible sets covered this week are actually special cases of the decision theoretic framework (i.e. they emerge when specific loss functions are selected).

Decision theory without data

Your friend proposes the following game (a good model for lottery!):

  • She flips a standard coin:
    • if it’s a “heads”: you give her $10
    • if it’s a “tails”: she gives you $9.
  • Your decision: to play or not to play?

Mathematical formulation

  • Set of actions: \(A = \{0, 1\}\) (\(1 =\) play, \(0 =\) do not play).
  • Probability model: \(X \sim {\mathrm{Bern}}(1/2)\).
  • Loss function:
    • \(L(a, x)\): how much do I lose if I pick action \(a \in A\) and encounter realization \(X = x\)?
  • Mathematical solution: minimize over \(a \in A\) the expected loss \[L(a) = \mathbb{E}[L(a, X)].\]

Example

Compute \(\operatorname{arg\,min}L(a)\) and \(\min L(a)\).

  1. \(1\) and \(1\)
  2. \(1\) and \(-1/2\)
  3. \(1\) and \(1/2\)
  4. \(0\) and \(0\)
  5. None of the above

The loss function is given by:

Do not play (\(a = 0\)) Play (\(a = 1\))
Heads (\(x = 1\)) \(0\) \(10\)
Tails (\(x = 0\)) \(0\) \(-9\)

Therefore, the action that maximizes \(L(a)\) is \(a = 0\) (i.e., \(a = 0\) is the “argmin”) and gives value \(L(0) = 0\).

It is useful and important to answer this easy question in excruciating mathematical details. This will prepare us for more complex decision scenarios.

Let us start by writing the loss function mathematically using indicator function defined as: \[\mathbb{1}[\text{some condition}] = \left\{ \begin{array}{ll} 1 & \text{if the condition is true,} \\ 0 & \text{if the condition is false.} \end{array} \right.\] This gives us: \[L(a, x) = \mathbb{1}[a = 0] \cdot (\$0) + \mathbb{1}[a = 1, x = 0] \cdot (-\$9) + \mathbb{1}[a = 1, x = 1] \cdot (\$10).\]

Let us simplify the expected loss function, i.e. \[\mathbb{E}[L(a, X)] = \mathbb{E}[\mathbb{1}[a = 0] \cdot (\$0) + \mathbb{1}[a = 1, X = 0] \cdot (-\$9) + \mathbb{1}[a = 1, X = 1] \cdot (\$10)].\] o do that simplification, first use linearity of expectation which you will recall from probability theory tells us that for any random variables \(X_1\) and \(X_2\), we have \(\mathbb{E}[X_1 + c X_2] = \mathbb{E}[X_1] + c \mathbb{E}[X_2]\) for any constant \(c\).

In our context, using linearity with \(c=1\) allows us to simplify the above to: \[\underbrace{\mathbb{E}[\mathbb{1}[a = 0] \cdot (\$0)]}_0 + \mathbb{E}[\mathbb{1}[a = 1, X = 0] \cdot (-\$9)] + \mathbb{E}[\mathbb{1}[a = 1, X = 1] \cdot (\$10)].\]

Trick 1: To deal with each remaining term, use that: \(\mathbb{1}[a = 1, X = 1] = \mathbb{1}[a = 1]\mathbb{1}[X = 1]\) (check!), and then use linearity with \(c = \$ 10 \cdot \mathbb{1}[a = 1]\) to get: \[\mathbb{E}[\mathbb{1}[a = 1, X = 1] \cdot (\$10)] = \$ 10 \cdot \mathbb{1}[a = 1] \mathbb{E}[\mathbb{1}[X = 1]].\]

LOTUS: Finally, we have to compute \(\mathbb{E}[\mathbb{1}[X = 1]]\). From our review of the Law of the Unconscious Statistician, this is done as follows: \[\mathbb{E}[\mathbb{1}[X = 1]] = \sum_x \mathbb{1}[x = 1] p_X(x) = 0 + 1 \cdot p_X(1) = 1/2.\]

Trick 2: you will often take expectation of indicator function, you will always find that \(\mathbb{E}[\mathbb{1}[\text{an event}]] = \mathbb{P}(\text{an event})\).

Putting it all together: \[L(a) = \mathbb{E}[L(a, X)] = -\$ 9 \cdot \mathbb{1}[a = 1] (1/2) + \$ 10 \cdot \mathbb{1}[a = 1] (1/2) = \$0.5 \cdot \mathbb{1}[a = 1].\]

Decision theory with data

  • We now consider the case where we have some data \(Y\) that give us information about \(X\).
  • The actions \(A\) and loss \(L(a, x)\) are the same as before.

How to modify the last section to take into account the data \(Y\)?

  • Do everything the same, except:
  • use the conditional PMF \(p_{X|Y}\) instead of the PMF \(p_X\).

Definition: the Bayes estimator

Mathematically: use an action \(a \in A\) minimizing the posterior expected loss, i.e. \[\operatorname{arg\,min}\mathbb{E}[L(a, X) | Y = y].\] where: \[\mathbb{E}[L(a, X) | Y = y] = \sum_x L(a, x) p_{X|Y}(x|y).\]

Example

You will apply the Bayes estimator in question 3 of this week’s exercises.