Expectation si the main tool to translate a posterior distribution into the various outputs of Bayesian inference (point estimate, credible intervals, prediction, action).
Expectation of a single random variable
Recall: \[\mathbb{E}[X] = \sum_x x p_X(x),\] where the sum is over the point masses of \(X\), i.e. \(\{x : p_X(x) > 0\}\).
Example: compute \(\mathbb{E}[X]\) if \(X \sim {\mathrm{Bern}}(p)\), with \(p = 0.8\).
Law of the Unconscious Statistician
Proposition: if \(g\) is some function, \[\mathbb{E}[g(X)] = \sum_x g(x) p_X(x).\]
Example: compute \(\mathbb{E}[X^2]\) if \(X \sim {\mathrm{Bern}}(p)\), and hence \(\operatorname{Var}[X] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2\).
Question: Compute \(\mathbb{E}[1/(X+1)]\), where \(X \sim {\mathrm{Bern}}(1/3)\).
Expectation of a function of several random variables
Let us go back to our running example:
Imagine a bag with 3 coins each with a different probability parameter \(p\)
Coin \(i\in \{0, 1, 2\}\) has bias \(i/2\)—in other words:
First coin: bias is \(0/2 = 0\) (i.e. both sides are “heads”, \(p = 0\))
Second coin: bias is \(1/2 = 0.5\) (i.e. standard coin, \(p = 1/2\))
Third coin: bias is \(2/2 = 1\) (i.e. both sides are “tails”, \(p = 1\))
Consider the following two steps sampling process
Step 1: pick one of the three coins, but do not look at it!
Step 2: flip the coin 4 times
Mathematically, this probability model can be written as follows: \[
\begin{align*}
X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\
Y_i | X &\sim {\mathrm{Bern}}(X/2)
\end{align*}
\tag{1}\]
Example: computing \(\mathbb{E}[X (Y_1+1)]\) (similar to what you will be doing in the exercise in question 1.1)
Note: this is of the form \(\mathbb{E}[g(\dots)]\), so we can use the Law of the Unconscious Statistician.
How to do it:
first, identify \(g\), here it is \(g(x, y_1, \dots, y_4) = x(y_1+1)\) (in the exercise it is slightly different)
denote by \(p\) the joint PMF of all the random variables in the model
Each sum runs over the point mass of its PMF as before, e.g. \(x \in \{0, 1, 2\}\).
Recall: \(p(x, y_1, y_2, y_3, y_4)\) can be computed using the chain rule.
Recall the decision tree, how to visualize the above equation?
Question: Assuming the path are summed left to right in the above diagram, what is the last term in the LOTUS’ iterated sum? Use the convention ‘false’ = 0, ‘true’ = 1.