Random variables
Outline
Topics
- Random variable as mathematical objects.
- Notation convention for observation/latent
Rationale
Random variables are used as building blocks for two key uses in Bayesian stats: modelling “knowns” (observations) and “unknowns” (latent variables/parameters/prediction).
Definition
A (real) random variable is a function from a sample space \(S\) to the reals, \(X : S \to \mathbb{R}\).
Example:
- Continuing the example with \(S = \{1, 2, 3, 4\}\).
- Consider \(X(s) = 1\) if \(s\) is odd, and \(X(s) = 0\) otherwise.
Probabilist’s notation
- Let \(X\) denote a random variable.
- The notation \((X = 1)\) or \((X \in E)\) is invalid in set theory.
- Therefore, probabilists “gave it a meaning” as follows:
\[(X = 1) = \{s : X(s) = 1\}.\]
Example: Consider \(X(s) = 1\) if \(s\) is odd, and \(X(s) = 0\) otherwise. Then \((X = 1)\) corresponds to the red circle.
Conventions: probability vs Bayesian
Probability convention:
- Random variables are denoted with capitals in probability theory
- The same letter in small cap is used for a dummy variable holding the output of the random variable.
- Note: “A dummy variable holding the output of the random variable” is called a realization.
- Example: \(X\) for the random variable and \(x\) for its realization.
- We will start off using this convention in the first few weeks.
Bayesian statistics convention:
- Often the capitalization convention is not used in the Bayesian statistics literature.
- Hence we will eventually drop the probability theory capitalization convention.
More conventions
- \(X\): unobserved random variable (synonym of “unobserved”: latent)
- \(Y\): observed random variable
More precisely:
- \(Y\) is the “mechanism of observation”..
- whereas the actual observation is a realization \(y\) of \(Y\).
Extension
A random vector is a function from a sample space to \(\mathbb{R}^n\).
Example in Bayesian statistics: the vector \((X, Y)\) containing both the unobserved and observed quantities.