Chain rule

Outline

Topics

  • Mathematical statement
  • Visual intuition on a decision tree
  • Special names for the pieces of chain rule (joint and conditional PMFs)
  • Conditional independence

Rationale

The chain rule allows us to compute a probability that the forward sampling function takes a given path.

The chain rule seem innocent but is used heavily in Bayesian statistics. It is also the building block for Bayes rule.

Proposition

If \(E_1\) and \(E_2\) are any events, \[\mathbb{P}(E_1, E_2) = \mathbb{P}(E_1) \mathbb{P}(E_2 | E_1).\]

This is true in any order, i.e. we also have \(\mathbb{P}(E_1, E_2) = \mathbb{P}(E_2) \mathbb{P}(E_1 | E_2)\).

Generalization

For any events \(E_1, E_2, E_3 \dots\),

\[\mathbb{P}(E_1, E_2, E_3 \dots) = \mathbb{P}(E_1) \mathbb{P}(E_2 | E_1) \mathbb{P}(E_3 | E_1, E_2) \dots.\]

Visual intuition

Chain rule: the probability of a node is the product of the edge labels on the path to the root.

flowchart TD
S__and__X_0 -- 1.0 --> S__and__X_0__and__Y1_false["Y1=false"]
S__and__X_2__and__Y1_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true["Y2=true"]
S -- 0.33 --> S__and__X_0["X=0"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"]
S -- 0.33 --> S__and__X_1["X=1"]
S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false["Y3=false"]
S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false["Y3=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_false["Y1=false"]
S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true["Y2=true"]
S__and__X_0__and__Y1_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true["Y2=true"]
S__and__X_2 -- 1.0 --> S__and__X_2__and__Y1_true["Y1=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_true["Y1=true"]
S -- 0.33 --> S__and__X_2["X=2"]
S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false["Y3=false"]
S__and__X_2__and__Y1_true__and__Y2_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"]
S__and__X_0__and__Y1_false__and__Y2_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"]
style S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true fill:#f9f,stroke:#333,stroke-width:4px

Poll: what is the probability of the node in red?

  1. 1/2
  2. 1/4
  3. 1/8
  4. 1/24
  5. None of the above

Multiplying the four edge leading to the node we get: \(1/2 \cdot 1/2 \cdot 1/2 \cdot 1/3 = 1/24\).

Poll: what is the event corresponding to that node?

  1. \((Y_3 = 1)\)
  2. \((Y_3 = 1, Y_2 = 1)\)
  3. \((Y_3 = 1, Y_2 = 1)\)
  4. \((Y_3 = 1, Y_2 = 1, Y_1 = 1)\)
  5. \((Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1)\)

Recall that the event is the intersection of all node labels to the root, hence the event is \((Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1)\).

The calculation we did visually in the previous clicker question is mathematically: \[\mathbb{P}(Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1) = \mathbb{P}(X = 1) \mathbb{P}(Y_1 | X = 1) \mathbb{P}(Y_2 | X = 1, Y_1 = 1) \mathbb{P}(Y_3 = 1 | X = 1, Y_1 = 1, Y_2 = 1).\]

Joint PMF

We will often encounter expression of the form of a conjunction (intersection/and) of several variables. A handy notation for that is the joint PMF

For example, here is the joint PMF of \((X, Y_1, Y_2, Y_3)\):

\[p(x, y_1, y_2, y_3) = \mathbb{P}(X = x, Y_1 = y_1, Y_2 = y_2, Y_3 = y_3).\]

Sometimes we put the random variables in question as subscript, for example \(p_{X, Y_1}(x, y)\) for the joint PMF of \(X\) and \(Y_1\).

Conditional PMF

Similarly, here is an example of a conditional PMF: \[p_{Y_1|X}(y | x) = \mathbb{P}(Y_1 = y | X = x).\]

Conditional independence

The model was specified as:

\[ \begin{align*} X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\ Y_i | X &\sim {\mathrm{Bern}}(X/2) \end{align*} \] i.e. with \(\mathbb{P}(X = x)\) and \(\mathbb{P}(Y_i = y | X = x)\) for all \(x\) and \(y\).

Question: how did we go from \(\mathbb{P}(Y_2 | X = 1, Y_1 = 1)\) (in our chain rule computation) to \(\mathbb{P}(Y_2 | X = 1)\) (model specification)?

Definition: \(V\) and \(W\) are conditionally independence given \(Z\) if \[\mathbb{P}(V = v, W = w | Z = z) = \mathbb{P}(V = v | Z = z) \mathbb{P}(W = w | Z = z).\]

Exercise: show the above definition is equivalent to:

\[\mathbb{P}(V = v | W = w, Z = z) = \mathbb{P}(V = v | Z = z).\]