Prior choice

Outline

Topics

  • Strategies for constructing priors.
  • Situations where choosing the prior matters.
  • Informative vs non-informative priors.

Rationale

Selection of priors is a necessary step in the construction of Bayesian models. Difficulties that can be encountered in prior choice motivate hierarchical models, introduced this week.

Example

Consider the problem of selecting a prior for a parameter defined on \([0, 1]\), e.g. \(p\) in the basic rocket launch success probability example (iid, continuous): \[\begin{align*} p &\sim \;\text{???} \\ y | p &\sim {\mathrm{Binom}}(n, p). \end{align*}\]

General approach

Often, the choice of prior is approached in two stages

  • First, pick a distribution family
    • in our example: let us pick say \({\color{red} {\mathrm{Beta}}}(\cdot, \cdot)\),
      • a common starting point for distribution with support \([0, 1]\)
      • but not the only choice, e.g., an alternative is the Kumaraswamy distribution.
  • Then, pick one member of this family
    • e.g. Beta\(({\color{red} 1}, {\color{red} 2})\), but how to pick these “magic” numbers (called hyper-parameters)?

When does the choice of prior matter?

  • When data is large, posterior tends to be less sensitive to specification of the prior
    • Theoretical reason: the “Bayesian central limit theorem” (Bernstein von-Mises theorem).
    • Not always true (e.g. partially identifiable model).
  • When data is small, posterior tends to be more sensitive to specification of the prior
    • Extreme example:
      • consider a rocket maiden flight,
      • i.e. there is no observation available yet,
      • Question: what will be the posterior?

Strategies for constructing priors

  • “Informative priors”: use expert knowledge to determine the prior (“prior elicitation”)
    • e.g.: when we build rockets, there is a lot of quality control, so it would be surprising to see very low values for the success probability parameter \(p\).
    • Many prior elicitation techniques developed, see Petrus et al, 2021 for a recent review,
    • but the state of that literature not completely satisfactory.
  • “Non-informative priors”: use properties of the likelihood to determine prior
    • more advanced, see Robert, Sections 3.5,
    • not automated, case-by-case mathematical derivation often intractable.
    • Automating this into PPLs is an open problem.
  • Today: side-step these issues thanks to Hierarchical models.
    • We will not remove the need for prior choice, instead we will decrease sensitivity of the posterior to these choices.

Example, continued

Suppose we picked a Beta, so we now have to pick a member of the Beta family.

\[\begin{align*} p &\sim {\mathrm{Beta}}(?, ?) \\ y | p &\sim {\mathrm{Binom}}(n, p). \end{align*}\]

Again the numbers to fill in the “?” are known as hyper-parameters, i.e., an hyper-parameter is a parameter of a prior.

Expert elicitation:

  • find a rocket expert,
  • ask the expert what they think is a reasonable range of values for \(p\), e.g. upper and lower quartiles \(Q_1, Q_3\).
  • Use numerical method to fit \(\alpha, \beta\) such that a Beta(\(\alpha, \beta\)) matches the reported quartiles \(Q_1, Q_3\).