flowchart TD S__and__X_0 -- 1.0 --> S__and__X_0__and__Y1_false["Y1=false"] S__and__X_2__and__Y1_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true["Y2=true"] S -- 0.33 --> S__and__X_0["X=0"] S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"] S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"] S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"] S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"] S -- 0.33 --> S__and__X_1["X=1"] S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"] S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_true["Y4=true"] S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false["Y3=false"] S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false["Y3=false"] S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"] S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"] S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"] S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_false["Y1=false"] S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false["Y2=false"] S__and__X_1__and__Y1_false__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true["Y3=true"] S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true["Y2=true"] S__and__X_0__and__Y1_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false["Y2=false"] S__and__X_1__and__Y1_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false["Y2=false"] S__and__X_1__and__Y1_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true["Y2=true"] S__and__X_2 -- 1.0 --> S__and__X_2__and__Y1_true["Y1=true"] S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_false["Y4=false"] S__and__X_1 -- 0.5 --> S__and__X_1__and__Y1_true["Y1=true"] S -- 0.33 --> S__and__X_2["X=2"] S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false["Y3=false"] S__and__X_2__and__Y1_true__and__Y2_true -- 1.0 --> S__and__X_2__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"] S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true["Y3=true"] S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_false["Y4=false"] S__and__X_1__and__Y1_true__and__Y2_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true["Y3=true"] S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"] S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"] S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_true__and__Y3_false__and__Y4_true["Y4=true"] S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_false["Y4=false"] S__and__X_1__and__Y1_true__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_true["Y3=true"] S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false -- 0.5 --> S__and__X_1__and__Y1_true__and__Y2_false__and__Y3_false__and__Y4_false["Y4=false"] S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_true__and__Y4_true["Y4=true"] S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_true__and__Y3_true__and__Y4_true["Y4=true"] S__and__X_1__and__Y1_false__and__Y2_false -- 0.5 --> S__and__X_1__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"] S__and__X_0__and__Y1_false__and__Y2_false -- 1.0 --> S__and__X_0__and__Y1_false__and__Y2_false__and__Y3_false["Y3=false"]
Decision trees
Outline
Topics
- Decision trees
- Review of more probability theory concepts, contextualized in decision trees: outcome, event, sample space, partitions, conditional probability
Rationale
We will use decision trees to provide visualization for a bunch of complex concepts such as forward simulation, posterior inference, importance sampling, probabilistic programming, etc.
Running example
- Imagine a bag with 3 coins each with a different probability parameter \(p\)
- Coin \(i\in \{0, 1, 2\}\) has bias \(i/2\)—in other words:
- First coin: bias is \(0/2 = 0\) (i.e. both sides are “heads”, \(p = 0\))
- Second coin: bias is \(1/2 = 0.5\) (i.e. standard coin, \(p = 1/2\))
- Third coin: bias is \(2/2 = 1\) (i.e. both sides are “tails”, \(p = 1\))
- Consider the following two steps sampling process
- Step 1: pick one of the three coins, but do not look at it!
- Step 2: flip the coin 4 times
- Mathematically, this probability model can be written as follows: \[ \begin{align*} X &\sim {\mathrm{Unif}}\{0, 1, 2\} \\ Y_i | X &\sim {\mathrm{Bern}}(X/2) \end{align*} \tag{1}\]
Decision tree
- Decision tree: a recursive classification of all possible scenarios
- Nodes in the tree are “groups of scenarios” which we call events
- Children of a node partitions an event into an exhaustive set of sub-cases,
- i.e. \(E_1, E_2, \dots\) is a partition of \(E\).
- In the decision tree below, we partitioned events until we get events at the leaves each containing a single scenario
- We call one individual scenario an outcome
- We call the set of all outcomes the sample space, \(S\), and put it at the root of decision trees.
Nodes and events
To describe the event corresponding to a node \(v\) in the tree:
- trace the path from the node \(v\) to the root
- take the intersection of all node labels.
Example: find the node in the above tree corresponding to the event \((X = 1) \cap (Y_1 = 0)\).
Probability notation review:
- \((X = 1) = \{s \in S : X(s) = 1\}\)
- \((X = 1, Y_1 = 0) = (X = 1) \cap (Y_1 = 0)\)
Edges and conditional probabilities
When there is an edge from events \(E_1\) to \(E_2\), we annotate it with \(\mathbb{P}(E_2 | E_1)\).
Recall: conditional probability of \(E_2\) given \(E_1\)
\[\mathbb{P}(E_2 | E_1) = \frac{\mathbb{P}(E_1 \cap E_2)}{\mathbb{P}(E_1)}\]
Example:
- take the edge from \(E_1 = (X = 1)\) to \(E_2 = (X = 1, Y_1 = 0)\). \[\mathbb{P}(E_2 | E_1) = \frac{\mathbb{P}(E_1 \cap E_2)}{\mathbb{P}(E_1)} = \frac{\mathbb{P}(E_2)}{\mathbb{P}(E_1)} = \mathbb{P}(Y_1 = 0 | X = 1)\]
- Translating \(\mathbb{P}(Y_1 = 0 | X = 1)\) into words: “the probability that the first flip is ‘heads’ \((Y_1 = 0)\) given that you picked the standard coin \((X = 1)\).’’
- Hence the edge from \(E_1\) to \(E_2\) is labelled \(1/2\).