


  • Expectation for discrete random models
  • Law of the Unconscious Statistician


Expectation si the main tool to translate a posterior distribution into the various outputs of Bayesian inference (point estimate, credible intervals, prediction, action).

Expectation of a single random variable

Recall: \[\mathbb{E}[X] = \sum_x x p_X(x),\] where the sum is over the point masses of \(X\), i.e. \(\{x : p_X(x) > 0\}\).

Example: compute \(\mathbb{E}[X]\) if \(X \sim {\mathrm{Bern}}(p)\), with \(p = 0.8\).

flowchart TD
S -- 0.2 --> S__and__X_false["X=false"]
S -- 0.8 --> S__and__X_true["X=true"]

Law of the Unconscious Statistician

Proposition: if \(g\) is some function, \[\mathbb{E}[g(X)] = \sum_x g(x) p_X(x).\]

Example: compute \(\mathbb{E}[X^2]\) if \(X \sim {\mathrm{Bern}}(p)\), and hence \(\operatorname{Var}[X] = \mathbb{E}[X^2] - (\mathbb{E}[X])^2\).

Question: Compute \(\mathbb{E}[1/(X+1)]\), where \(X \sim {\mathrm{Bern}}(1/3)\).

  1. 3/5
  2. 3/4
  3. 5/6
  4. 1/3
  5. None of the above

Use \(g(x) = 1/(x+1)\) in LOTUS:

\[ \mathbb{E}[1/(X+1)] = \sum_{x\in\{0, 1\}} g(x) p(x) = \frac{2}{3} \frac{1}{0+1} + \frac{1}{3}\frac{1}{1+1} = \frac{5}{6}. \]

Expectation of a function of several random variables

Let us go back to our running example:

  • Imagine a bag with 3 coins each with a different probability parameter \(p\)
  • Coin \(i\in \{0, 1, 2\}\) has bias \(i/2\)—in other words:
    • First coin: bias is \(0/2 = 0\) (i.e. both sides are “heads”, \(p = 0\))
    • Second coin: bias is \(1/2 = 0.5\) (i.e. standard coin, \(p = 1/2\))
    • Third coin: bias is \(2/2 = 1\) (i.e. both sides are “tails”, \(p = 1\))

  • Consider the following two steps sampling process
    • Step 1: pick one of the three coins, but do not look at it!
    • Step 2: flip the coin 4 times
  • Mathematically, this probability model can be written as follows: \[ \begin{align*} X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\ Y_i | X &\sim {\mathrm{Bern}}(X/2) \end{align*} \tag{1}\]

Example: computing \(\mathbb{E}[X (Y_1+1)]\) (similar to what you will be doing in the exercise in question 1.1)

Note: this is of the form \(\mathbb{E}[g(\dots)]\), so we can use the Law of the Unconscious Statistician.

How to do it:

  • first, identify \(g\), here it is \(g(x, y_1, \dots, y_4) = x(y_1+1)\) (in the exercise it is slightly different)
  • denote by \(p\) the joint PMF of all the random variables in the model
  • compute the expectation using \[\mathbb{E}[g(X, Y_1, \dots, Y_4)] = \sum_x \sum_{y_1} \sum_{y_2} \dots \sum_{y_4} g(x, y_1, \dots, y_4) p(x, y_1, y_2, y_3, y_4).\]
  • Each sum runs over the point mass of its PMF as before, e.g. \(x \in \{0, 1, 2\}\).
  • Recall: \(p(x, y_1, y_2, y_3, y_4)\) can be computed using the chain rule.

Recall the decision tree, how to visualize the above equation?

flowchart TD
S__and__X_0 -- 1.0 --> S__and__X_0__and__Y1_false["Y1=false"]
S__and__X_2__and__Y1_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true["Y2=true"]
S -- 0.33 --> S__and__X_0["X=0"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"]
S -- 0.33 --> S__and__X_1["X=1"]
S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false["Y3=false"]
S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false["Y3=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_false["Y1=false"]
S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true["Y2=true"]
S__and__X_0__and__Y1_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true["Y2=true"]
S__and__X_2 -- 1.0 --> S__and__X_2__and__Y1_true["Y1=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_true["Y1=true"]
S -- 0.33 --> S__and__X_2["X=2"]
S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false["Y3=false"]
S__and__X_2__and__Y1_true__and__Y2_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"]
S__and__X_0__and__Y1_false__and__Y2_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"]

Question: Assuming the path are summed left to right in the above diagram, what is the last term in the LOTUS’ iterated sum? Use the convention ‘false’ = 0, ‘true’ = 1.

  1. 4/3
  2. 3/4
  3. 4
  4. 1/3
  5. None of the above.

The last term, \(g(x, y_1, y_2, y_3, y_4) p(x, y_1, y_2, y_3, y_4)\), for \(x = 2, y_1 = y_2 = y_3 = y_4 = 1\) is the product of: \[ g(2, 1, 1, 1, 1) = 2(1 + 1) = 4, \] and, via chain rule, \[ p(x, y_1, y_2, y_3, y_4) = \frac{1}{3} \times 1 \times 1 \times 1 \times 1 = 1/3 \]