Custom bricks

Outline

Topics

  • What to do when you need a distribution not in the Stan built-in library.
  • Understanding why Stan (and most MCMC methods) represents the density in log-scale.

Rationale

Even though Stan has a long list of built-in distribution, you will eventually run into a situation where you need a distribution not in the list.

While learning how to do so, we will need to look more deeply at how Stan works under the hood.

Example

We will show how to implement the Kumaraswamy distribution in Stan. As of March 2024, it is not in Stan’s built-in library.

The Kumaraswamy distribution has density: \[f(x) = ab x^{a-1} (1 - x^a)^{b-1},\] where \(x \in (0, 1)\) and \(a, b\) are positive parameters.

Executive version

  • In the model block:
    • Use target += [logarithm (base e) of the log density]
    • Applying this to our Kumaraswamy example: \[\log(f(x)) = \log(a) + \log(b) + (a-1) \log(x) + (b-1) \log(1 - x^a).\]
  • In the parameters block:
    • Make sure you enforce the support (here \(x \in (0, 1)\))
suppressPackageStartupMessages(require(rstan))
data {
  real<lower=0> a;
  real<lower=0> b;  
}

parameters {
1  real<lower=0, upper=1> x;
}

model {
2  target += log(a) + log(b) + (a-1) * log(x) + (b-1) * log1p(-x^a);
                                                    // ^ log1p(z) = log(1+z)
}
1
Support enforced.
2
Contribution of the Kumaraswamy log density added the log joint density.
fit = sampling(
  Kumaraswamy,
  seed = 1,
  refresh = 0,
  data = list(a = 2, b = 5),       
  iter = 10000                   
)

Checking that the histogram roughly matches the one from the wikipedia article (see top of page Figure, purple line).

suppressPackageStartupMessages(require(bayesplot))
suppressPackageStartupMessages(require(ggplot2))
mcmc_hist(fit, pars = c("x")) + theme_minimal()

Explanation

Log-scale computation

Question: why is Stan computing the logarithm of \(\gamma(x)\)?

Key fact: Stan, like most numerical methods, use double-precision floating point approximation of real numbers.

  • Suppose you have 500 observations.
  • Each have a likelihood of about \(1/10\).
  • What is the joint likelihood?
likelihood = 1.0
for (i in 1:500) {
  likelihood = likelihood * (1/10)
}
likelihood
[1] 0

Question: Why R returns zero?

  • Because \((1/10)^{500}\) is smaller than the smallest positive number representeable in double precision (\(\approx 10^{-300}\))
    • So it gets rounded to zero.
    • This is called an underflow

Log-scale computation: let us say we store instead the log-likelihood:

loglikelihood = log(1.0)
for (i in 1:500) {
  loglikelihood = loglikelihood + log(1/10)
}
loglikelihood
[1] -1151.293
500*log(1/10)
[1] -1151.293
  • We have avoided the underflow issue!
  • That’s why a Stan program computes \(\log \gamma(x)\) instead of \(\gamma(x)\).

Encapsulation into a function

If you are using a custom density for several variables, encapsulate it into a function:

1functions {
2  real Kumaraswamy_lpdf(real x, real a, real b) {
    return log(a) + log(b) + (a-1) * log(x) + (b-1) * log1p(-x^a);
  }
}

data {
  real<lower=0> a;
  real<lower=0> b;  
}

parameters {
  real<lower=0, upper=1> x; 
}

model {
3  x ~ Kumaraswamy(a, b);
}
1
The function block allows you to create custom functions. See the Stan documentation for more details.
2
For Stan to pick up your function in its ~ syntax, you must use the special suffix _lpdf for your function name, which stands for “log probability density function.”
3
Then you can use your distribution name using ~ as usual (notice we do not include the _lpdf when we do such call). From this it is apparent that Stan’s ~ notation is just a shortcut for target += distributionName_lpdf(...).