Rao-Blackwellization is an important technique for two reasons:
It can speed-up MCMC considerably.
In languages that do not support discrete latent variables (for example, Stan), this is they only way to implement certain model (e.g. mixture models, coming next)
This time we implement it in Stan differently to demonstrate Rao-Blackwellization
Code
set.seed(1)# detection limit: value higher than that stay at the limitlimit =1.1n_measurements =10true_mean =5.6# data, if we were able to observe perfectlyy =rexp(n_measurements, 1.0/true_mean)# number of measurements higher than the detection limitn_above_limit =sum(y >= limit)n_below_limit =sum(y < limit)# subset of the measurements that are below the limit:data_below_limit = y[y < limit]# measurements: those higher than the limit stay at the limitmeasurements =ifelse(y < limit, y, limit)
Inference for Stan model: anon_model.
1 chains, each with iter=1e+05; warmup=50000; thin=1;
post-warmup draws per chain=50000, total post-warmup draws=50000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
rate 0.39 0.00 0.20 0.11 0.25 0.36 0.50 0.86 17497 1
mean 3.39 0.02 2.35 1.16 1.99 2.78 4.02 9.25 13206 1
lp__ -8.24 0.01 0.73 -10.30 -8.40 -7.96 -7.77 -7.72 18171 1
Samples were drawn using NUTS(diag_e) at Tue Mar 19 07:17:10 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Conclusion: Essentially the same result (i.e., within MCSE), but about 4x faster! How is this possible?
Mathematical underpinnings
Consider a simplified example where there is only one observation:
\[\begin{align*}
X &\sim {\mathrm{Exp}}(1/100) \\
H &\sim {\mathrm{Exp}}(X) \\
C &= \mathbb{1}[H \ge L], \\
Y &= C L + (1 - C) H.
\end{align*} \tag{1}\]
Suppose our one observation is censored (\(C = 1\)).
Reduce the problem to a target over \(x\) only, \(\gamma(x)\).
This is because we do not care much about \(h\): it is a nuisance variable.
But we would like to do so in such as way that the result of inference on \(x\) is the same with \(\gamma(x, h)\) and \(\gamma(x)\) (just faster with Rao-Blackwellization).
Question: how to define \(\gamma(x)\) from \(\gamma(x, h)\) so that the result on \(x\) stay the same?
Note that in the Stan code above, \(1 - F(L; x)\) is implemented with exponential_lccdf(limit | rate) (“lccdf” stands for “log complement of cumulative distribution function”).