Prior choice
- Strategies for constructing priors.
- Situations where choosing the prior matters.
- Informative vs non-informative priors.
Selection of priors is a necessary step in the construction of Bayesian models. Difficulties that can be encountered in prior choice motivate hierarchical models, introduced this week.
Consider the problem of selecting a prior for a parameter defined on \([0, 1]\), e.g. \(p\) in the basic rocket launch success probability example (iid, continuous): \[\begin{align*} p &\sim \;\text{???} \\ y | p &\sim {\mathrm{Binom}}(n, p). \end{align*}\]
General approach
Often, the choice of prior is approached in two stages
- First, pick a distribution family
- in our example: let us pick say \({\color{red} {\mathrm{Beta}}}(\cdot, \cdot)\),
- a common starting point for distribution with support \([0, 1]\)
- but not the only choice, e.g., an alternative is the Kumaraswamy distribution.
- in our example: let us pick say \({\color{red} {\mathrm{Beta}}}(\cdot, \cdot)\),
- Then, pick one member of this family
- e.g. Beta\(({\color{red} 1}, {\color{red} 2})\), but how to pick these “magic” numbers (called hyper-parameters)?
When does the choice of prior matter?
- When data is large, posterior tends to be less sensitive to specification of the prior
- Theoretical reason: the “Bayesian central limit theorem” (Bernstein von-Mises theorem).
- Not always true (e.g. partially identifiable model).
- When data is small, posterior tends to be more sensitive to specification of the prior
- Extreme example:
- consider a rocket maiden flight,
- i.e. there is no observation available yet,
- Question: what will be the posterior?
- Extreme example:
- A normal distribution.
- Approximately normal.
- The prior.
- There is not enough information to answer.
- None of the above.
When there is no observations, the posterior is equal to the prior.
This shows an extreme example of the posterior being sensitive to the choice of prior!
Strategies for constructing priors
- “Informative priors”: use expert knowledge to determine the prior (“prior elicitation”)
- e.g.: when we build rockets, there is a lot of quality control, so it would be surprising to see very low values for the success probability parameter \(p\).
- Many prior elicitation techniques developed, see Petrus et al, 2021 for a recent review,
- but the state of that literature not completely satisfactory.
- “Non-informative priors”: use properties of the likelihood to determine prior
- more advanced, see Robert, Sections 3.5,
- not automated, case-by-case mathematical derivation often intractable.
- Automating this into PPLs is an open problem.
- Today: side-step these issues thanks to Hierarchical models.
- We will not remove the need for prior choice, instead we will decrease sensitivity of the posterior to these choices.
Example, continued
Suppose we picked a Beta, so we now have to pick a member of the Beta family.
\[\begin{align*} p &\sim {\mathrm{Beta}}(?, ?) \\ y | p &\sim {\mathrm{Binom}}(n, p). \end{align*}\]
Again the numbers to fill in the “?” are known as hyper-parameters, i.e., an hyper-parameter is a parameter of a prior.
Expert elicitation:
- find a rocket expert,
- ask the expert what they think is a reasonable range of values for \(p\), e.g. upper and lower quartiles \(Q_1, Q_3\).
- Use numerical method to fit \(\alpha, \beta\) such that a Beta(\(\alpha, \beta\)) matches the reported quartiles \(Q_1, Q_3\).