# Chain rule

## Outline

### Topics

- Mathematical statement
- Visual intuition on a decision tree
- Special names for the pieces of chain rule (joint and conditional PMFs)
- Conditional independence

### Rationale

The chain rule allows us to compute a probability that the forward sampling function takes a given path.

The chain rule seem innocent but is used heavily in Bayesian statistics. It is also the building block for Bayes rule.

## Proposition

If \(E_1\) and \(E_2\) are any events, \[\mathbb{P}(E_1, E_2) = \mathbb{P}(E_1) \mathbb{P}(E_2 | E_1).\]

This is true in any order, i.e. we also have \(\mathbb{P}(E_1, E_2) = \mathbb{P}(E_2) \mathbb{P}(E_1 | E_2)\).

## Generalization

For any events \(E_1, E_2, E_3 \dots\),

\[\mathbb{P}(E_1, E_2, E_3 \dots) = \mathbb{P}(E_1) \mathbb{P}(E_2 | E_1) \mathbb{P}(E_3 | E_1, E_2) \dots.\]

## Visual intuition

**Chain rule:** the probability of a node is the product of the edge labels on the path to the root.

## Poll: what is the probability of the node in red?

- 1/2
- 1/4
- 1/8
- 1/24
- None of the above

Multiplying the four edge leading to the node we get: \(1/2 \cdot 1/2 \cdot 1/2 \cdot 1/3 = 1/24\).

## Poll: what is the event corresponding to that node?

- \((Y_3 = 1)\)
- \((Y_3 = 1, Y_2 = 1)\)
- \((Y_3 = 1, Y_2 = 1)\)
- \((Y_3 = 1, Y_2 = 1, Y_1 = 1)\)
- \((Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1)\)

Recall that the event is the intersection of all node labels to the root, hence the event is \((Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1)\).

The calculation we did visually in the previous clicker question is mathematically: \[\mathbb{P}(Y_3 = 1, Y_2 = 1, Y_1 = 1, X = 1) = \mathbb{P}(X = 1) \mathbb{P}(Y_1 | X = 1) \mathbb{P}(Y_2 | X = 1, Y_1 = 1) \mathbb{P}(Y_3 = 1 | X = 1, Y_1 = 1, Y_2 = 1).\]

## Joint PMF

We will often encounter expression of the form of a conjunction (intersection/and) of several variables. A handy notation for that is the **joint PMF**

For example, here is the joint PMF of \((X, Y_1, Y_2, Y_3)\):

\[p(x, y_1, y_2, y_3) = \mathbb{P}(X = x, Y_1 = y_1, Y_2 = y_2, Y_3 = y_3).\]

Sometimes we put the random variables in question as subscript, for example \(p_{X, Y_1}(x, y)\) for the joint PMF of \(X\) and \(Y_1\).

## Conditional PMF

Similarly, here is an example of a conditional PMF: \[p_{Y_1|X}(y | x) = \mathbb{P}(Y_1 = y | X = x).\]

## Conditional independence

The model was specified as:

\[ \begin{align*} X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\ Y_i | X &\sim {\mathrm{Bern}}(X/2) \end{align*} \] i.e. with \(\mathbb{P}(X = x)\) and \(\mathbb{P}(Y_i = y | X = x)\) for all \(x\) and \(y\).

**Question:** how did we go from \(\mathbb{P}(Y_2 | X = 1, Y_1 = 1)\) (in our chain rule computation) to \(\mathbb{P}(Y_2 | X = 1)\) (model specification)?

**Definition:** \(V\) and \(W\) are conditionally independence given \(Z\) if \[\mathbb{P}(V = v, W = w | Z = z) = \mathbb{P}(V = v | Z = z) \mathbb{P}(W = w | Z = z).\]

**Exercise:** show the above definition is equivalent to:

\[\mathbb{P}(V = v | W = w, Z = z) = \mathbb{P}(V = v | Z = z).\]