Prior choice
Outline
Topics
- Strategies for constructing priors.
- Situations where choosing the prior matters.
- Informative vs non-informative priors.
Rationale
Selection of priors is a necessary step in the construction of Bayesian models. Difficulties that can be encountered in prior choice motivate hierarchical models, introduced this week.
Example
Consider the problem of selecting a prior for a parameter defined on \([0, 1]\), e.g. \(p\) in the basic rocket launch success probability example (iid, continuous): \[\begin{align*} p &\sim \;\text{???} \\ y | p &\sim {\mathrm{Binom}}(n, p). \end{align*}\]
General approach
Often, the choice of prior is approached in two stages
- First, pick a distribution family
- in our example: let us pick say \({\color{red} {\mathrm{Beta}}}(\cdot, \cdot)\),
- a common starting point for distribution with support \([0, 1]\)
- but not the only choice, e.g., an alternative is the Kumaraswamy distribution.
- in our example: let us pick say \({\color{red} {\mathrm{Beta}}}(\cdot, \cdot)\),
- Then, pick one member of this family
- e.g. Beta\(({\color{red} 1}, {\color{red} 2})\), but how to pick these “magic” numbers (called hyper-parameters)?
When does the choice of prior matter?
- When data is large, posterior tends to be less sensitive to specification of the prior
- Theoretical reason: the “Bayesian central limit theorem” (Bernstein von-Mises theorem).
- Not always true (e.g. partially identifiable model).
- When data is small, posterior tends to be more sensitive to specification of the prior
- Extreme example:
- consider a rocket maiden flight,
- i.e. there is no observation available yet,
- Question: what will be the posterior?
- Extreme example:
Strategies for constructing priors
- “Informative priors”: use expert knowledge to determine the prior (“prior elicitation”)
- e.g.: when we build rockets, there is a lot of quality control, so it would be surprising to see very low values for the success probability parameter \(p\).
- Many prior elicitation techniques developed, see Petrus et al, 2021 for a recent review,
- but the state of that literature not completely satisfactory.
- “Non-informative priors”: use properties of the likelihood to determine prior
- more advanced, see Robert, Sections 3.5,
- not automated, case-by-case mathematical derivation often intractable.
- Automating this into PPLs is an open problem.
- Today: side-step these issues thanks to Hierarchical models.
- We will not remove the need for prior choice, instead we will decrease sensitivity of the posterior to these choices.
Example, continued
Suppose we picked a Beta, so we now have to pick a member of the Beta family.
\[\begin{align*} p &\sim {\mathrm{Beta}}(?, ?) \\ y | p &\sim {\mathrm{Binom}}(n, p). \end{align*}\]
Again the numbers to fill in the “?” are known as hyper-parameters, i.e., an hyper-parameter is a parameter of a prior.
Expert elicitation:
- find a rocket expert,
- ask the expert what they think is a reasonable range of values for \(p\), e.g. upper and lower quartiles \(Q_1, Q_3\).
- Use numerical method to fit \(\alpha, \beta\) such that a Beta(\(\alpha, \beta\)) matches the reported quartiles \(Q_1, Q_3\).