Prediction

Outline

Topics

  • Prediction using decision trees
  • Example

Rationale

Often we do not care so much about “parameters” but instead about predicting future observations.

Example: coins in a bag

Consider the setup from last week with 3 coins and 3 flips with Y=(1,1,1) (in the following, let 1 denote a vector of 1’s).

Question: given you have see 3 heads, what is the probability that the next one is also heads?

Mathematically: P(Y4=1|Y1:3=1). This is known as “prediction”.

General approach

Key message: In Bayesian statistics, prediction and parameter estimation are treated in the exact same way!

Idea: Add Y4 to the unobserved random variables, i.e. set X~=(X,Y4).

Then, to compute P(Y4=1|Y1:3=1) use same techniques as last week (decision tree, chain rule, axioms of probability).

Example, continued

Use the following picture to help you computing P(Y4=1|Y1:3=1).

Notation: let γ(i)=P(Y1:3=1,Y4=i).

Question: compute γ(0).

  1. 0.02
  2. 0.06
  3. 0.52
  4. 0.94
  5. None of the above
  • There is only one way to get (Y1:3=1,Y4=0): this has to be the standard coin, i.e., (Y1:3=1,Y4=0)=(X=1,Y1:3=1,Y4=0)
  • To compute the probability of that path we can multiply the edge probabilities (why?): γ(0)=P(X=1,Y1:3=1,Y4=0)=(1/3)×(1/2)4=1/480.02.

Question: compute γ(1).

  1. 0.11
  2. 0.35
  3. 0.52
  4. 0.94
  5. None of the above
  • Twist: two distinct paths are compatible with the event: (Y1:4=1)=(X=2,Y1:4=1)(X=1,Y1:4=1).
  • Sum the probabilities of the paths leading to the same prediction (why can we do this?).: P(Y1:4=1)=P(X=2,Y1:4=1)+P(X=1,Y1:4=1)=1/48+1/30.35.

Question: compute the predictive

  1. 0.94
  2. 0.52
  3. 0.35
  4. 0.11
  5. None of the above

Let: π(i):=P(Y4=i|Y1:3=1).

Note: π(i)γ(i).

Hence: (π(0),π(1))=(γ(0),γ(1))γ(0)+γ(1). Therefore we get: P(Y4=1|Y1:3=1)=γ(1)γ(0)+γ(1)=17/180.94.