Outline
Topics
Mathematical statement
Visual intuition on a decision tree
Special names for the pieces of chain rule (joint and conditional PMFs)
Conditional independence
Rationale
The chain rule allows us to compute a probability that the forward sampling function takes a given path.
The chain rule seem innocent but is used heavily in Bayesian statistics. It is also the building block for Bayes rule.
Proposition
If \(E_1\) and \(E_2\) are any events, \[\mathbb{P}(E_1, E_2) = \mathbb{P}(E_1) \mathbb{P}(E_2 | E_1).\]
This is true in any order, i.e. we also have \(\mathbb{P}(E_1, E_2) = \mathbb{P}(E_2) \mathbb{P}(E_1 | E_2)\) .
Generalization
For any events \(E_1, E_2, E_3 \dots\) ,
\[\mathbb{P}(E_1, E_2, E_3 \dots) = \mathbb{P}(E_1) \mathbb{P}(E_2 | E_1) \mathbb{P}(E_3 | E_1, E_2) \dots.\]
Visual intuition
Chain rule: the probability of a node is the product of the edge labels on the path to the root.
flowchart TD
S__and__X_0 -- 1.0 --> S__and__X_0__and__Y1_false["Y1=false"]
S__and__X_2__and__Y1_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true["Y2=true"]
S -- 0.33 --> S__and__X_0["X=0"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"]
S -- 0.33 --> S__and__X_1["X=1"]
S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false["Y3=false"]
S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false["Y3=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_false["Y1=false"]
S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true["Y2=true"]
S__and__X_0__and__Y1_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false["Y2=false"]
S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true["Y2=true"]
S__and__X_2 -- 1.0 --> S__and__X_2__and__Y1_true["Y1=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_true["Y1=true"]
S -- 0.33 --> S__and__X_2["X=2"]
S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false["Y3=false"]
S__and__X_2__and__Y1_true__and__Y2_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true["Y3=true"]
S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"]
S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"]
S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"]
S__and__X_0__and__Y1_false__and__Y2_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"]
style S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true fill:#f9f,stroke:#333,stroke-width:4px
Poll: what is the probability of the node in red?
1/2
1/4
1/8
1/24
None of the above
Multiplying the four edge leading to the node we get: \(1/2 \cdot 1/2 \cdot 1/2 \cdot 1/3 = 1/24\) .
Poll: what is the event corresponding to that node?
\((Y_3 = 1)\)
\((Y_3 = 1, Y_2 = 1)\)
\((Y_3 = 1, Y_2 = 1)\)
\((Y_3 = 1, Y_2 = 1, Y_1 = 1)\)
\((Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1)\)
Recall that the event is the intersection of all node labels to the root, hence the event is \((Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1)\) .
The calculation we did visually in the previous clicker question is mathematically: \[\mathbb{P}(Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1) = \mathbb{P}(X = 1) \mathbb{P}(Y_1 | X = 1) \mathbb{P}(Y_2 | X = 1, Y_1 = 1) \mathbb{P}(Y_3 = 1 | X = 1, Y_1 = 1, Y_2 = 1).\]
Joint PMF
We will often encounter expression of the form of a conjunction (intersection/and) of several variables. A handy notation for that is the joint PMF
For example, here is the joint PMF of \((X, Y_1, Y_2, Y_3)\) :
\[p(x, y_1, y_2, y_3) = \mathbb{P}(X = x, Y_1 = y_1, Y_2 = y_2, Y_3 = y_3).\]
Sometimes we put the random variables in question as subscript, for example \(p_{X, Y_1}(x, y)\) for the joint PMF of \(X\) and \(Y_1\) .
Conditional PMF
Similarly, here is an example of a conditional PMF: \[p_{Y_1|X}(y | x) = \mathbb{P}(Y_1 = y | X = x).\]
Conditional independence
The model was specified as:
\[
\begin{align*}
X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\
Y_i | X &\sim {\mathrm{Bern}}(X/2)
\end{align*}
\] i.e. with \(\mathbb{P}(X = x)\) and \(\mathbb{P}(Y_i = y | X = x)\) for all \(x\) and \(y\) .
Question: how did we go from \(\mathbb{P}(Y_2 | X = 1, Y_1 = 1)\) (in our chain rule computation) to \(\mathbb{P}(Y_2 | X = 1)\) (model specification)?
Definition: \(V\) and \(W\) are conditionally independence given \(Z\) if \[\mathbb{P}(V = v, W = w | Z = z) = \mathbb{P}(V = v | Z = z) \mathbb{P}(W = w | Z = z).\]
Exercise: show the above definition is equivalent to:
\[\mathbb{P}(V = v | W = w, Z = z) = \mathbb{P}(V = v | Z = z).\]