Review of more probability theory concepts, contextualized in decision trees: outcome, event, sample space, partitions, conditional probability
Rationale
We will use decision trees to provide visualization for a bunch of complex concepts such as forward simulation, posterior inference, importance sampling, probabilistic programming, etc.
Running example
Imagine a bag with 3 coins each with a different probability parameter \(p\)
Coin \(i\in \{0, 1, 2\}\) has bias \(i/2\)—in other words:
First coin: bias is \(0/2 = 0\) (i.e. both sides are “heads”, \(p = 0\))
Second coin: bias is \(1/2 = 0.5\) (i.e. standard coin, \(p = 1/2\))
Third coin: bias is \(2/2 = 1\) (i.e. both sides are “tails”, \(p = 1\))
Consider the following two steps sampling process
Step 1: pick one of the three coins, but do not look at it!
Step 2: flip the coin 4 times
Mathematically, this probability model can be written as follows: \[
\begin{align*}
X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\
Y_i | X &\sim {\mathrm{Bern}}(X/2)
\end{align*}
\tag{1}\]
Decision tree
Decision tree: a recursive classification of all possible scenarios
Nodes in the tree are “groups of scenarios” which we call events
Children of a node partitions an event into an exhaustive set of sub-cases,
i.e. \(E_1, E_2, \dots\) is a partition of \(E\).
In the decision tree below, we partitioned events until we get events at the leaves each containing a single scenario
We call one individual scenario an outcome
We call the set of all outcomes the sample space, \(S\), and put it at the root of decision trees.
Nodes and events
To describe the event corresponding to a node \(v\) in the tree:
trace the path from the node \(v\) to the root
take the intersection of all node labels.
Example: find the node in the above tree corresponding to the event \((X = 1) \cap (Y_1 = 0)\).
Probability notation review:
\((X = 1) = \{s \in S : X(s) = 1\}\)
\((X = 1, Y_1 = 0) = (X = 1) \cap (Y_1 = 0)\)
Edges and conditional probabilities
When there is an edge from events \(E_1\) to \(E_2\), we annotate it with \(\mathbb{P}(E_2 | E_1)\).
Recall: conditional probability of \(E_2\) given \(E_1\)
take the edge from \(E_1 = (X = 1)\) to \(E_2 = (X = 1, Y_1 = 0)\). \[\mathbb{P}(E_2 | E_1) = \frac{\mathbb{P}(E_1 \cap E_2)}{\mathbb{P}(E_1)} = \frac{\mathbb{P}(E_2)}{\mathbb{P}(E_1)} = \mathbb{P}(Y_1 = 0 | X = 1)\]
Translating \(\mathbb{P}(Y_1 = 0 | X = 1)\) into words: “the probability that the first flip is ‘heads’ \((Y_1 = 0)\) given that you picked the standard coin \((X = 1)\).’’
Hence the edge from \(E_1\) to \(E_2\) is labelled \(1/2\).