Rao-Blackwellization is an important technique for two reasons:
It can speed up MCMC considerably.
In languages that do not support discrete latent variables (for example, Stan), this is the only way to implement certain models (e.g. mixture models, coming next)
This time we implement it in Stan differently to demonstrate Rao-Blackwellization
Code
set.seed(1)# detection limit: value higher than that stay at the limitlimit =1.1n_measurements =10true_mean =5.6# data, if we were able to observe perfectlyy =rexp(n_measurements, 1.0/true_mean)# number of measurements higher than the detection limitn_above_limit =sum(y >= limit)n_below_limit =sum(y < limit)# subset of the measurements that are below the limit:data_below_limit = y[y < limit]# measurements: those higher than the limit stay at the limitmeasurements =ifelse(y < limit, y, limit)
...
Chain 1 finished in 1.6 seconds.
...
variable mean mcse_mean
...
10 mean 3.41 0.0178
Conclusion: Essentially the same result (i.e., within MCSE), but the new version is faster! How is this possible?
Mathematical underpinnings
Consider a simplified example where there is only one observation:
\[\begin{align*}
X &\sim {\mathrm{Exp}}(1/100) \\
H &\sim {\mathrm{Exp}}(X) \\
C &= \mathbb{1}[H \ge L], \\
Y &= C L + (1 - C) H.
\end{align*} \tag{1}\]
Suppose our one observation is censored (\(C = 1\)).
Reduce the problem to a target over \(x\) only, \(\gamma(x)\).
This is because we do not care much about \(h\): it is a nuisance variable.
But we would like to do so in such a way that the result of inference on \(x\) is the same with \(\gamma(x, h)\) and \(\gamma(x)\) (just faster with Rao-Blackwellization).
Question: how to define \(\gamma(x)\) from \(\gamma(x, h)\) so that the result on \(x\) stays the same?
Note that in the Stan code above, \(1 - F(L; x)\) is implemented with exponential_lccdf(limit | rate) (“lccdf” stands for “log complement of cumulative distribution function”).