Random variables

Outline

Topics

Random variable as mathematical objects.
Notation convention for observation/latent

Rationale

Random variables are used as building blocks for two key uses in Bayesian stats: modelling “knowns” (observations) and “unknowns” (latent variables/parameters/prediction).

Definition

A (real) random variable is a function from a sample space \(S\) to the reals, \(X : S \to \mathbb{R}\).

Example:

Continuing the example with \(S = \{1, 2, 3, 4\}\).
Consider \(X(s) = 1\) if \(s\) is odd, and \(X(s) = 0\) otherwise.

Probabilist’s notation

Let \(X\) denote a random variable.
The notation \((X = 1)\) or \((X \in E)\) is invalid in set theory.
Therefore, probabilists “gave it a meaning” as follows:

\[(X = 1) = \{s : X(s) = 1\}.\]

Example: Consider \(X(s) = 1\) if \(s\) is odd, and \(X(s) = 0\) otherwise. Then \((X = 1)\) corresponds to the red circle.

Conventions: probability vs Bayesian

Probability convention:

Random variables are denoted with capitals in probability theory
The same letter in small cap is used for a dummy variable holding the output of the random variable.
- Note: “A dummy variable holding the output of the random variable” is called a realization.
- Example: \(X\) for the random variable and \(x\) for its realization.
We will start off using this convention in the first few weeks.

Bayesian statistics convention:

Often the capitalization convention is not used in the Bayesian statistics literature.
Hence we will eventually drop the probability theory capitalization convention.

More conventions

\(X\): unobserved random variable (synonym of “unobserved”: latent)
\(Y\): observed random variable

More precisely:

\(Y\) is the “mechanism of observation”..
whereas the actual observation is a realization \(y\) of \(Y\).

Extension

A random vector is a function from a sample space to \(\mathbb{R}^n\).

Example in Bayesian statistics: the vector \((X, Y)\) containing both the unobserved and observed quantities.