# Random variables

## Outline

### Topics

- Random variable as mathematical objects.
- Notation convention for observation/latent

### Rationale

Random variables are used as building blocks for two key uses in Bayesian stats: modelling “knowns” (observations) and “unknowns” (latent variables/parameters/prediction).

## Definition

A (real) **random variable** is a function from a sample space \(S\) to the reals, \(X : S \to \mathbb{R}\).

**Example:**

- Continuing the example with \(S = \{1, 2, 3, 4\}\).
- Consider \(X(s) = 1\) if \(s\) is odd, and \(X(s) = 0\) otherwise.

## Probabilist’s notation

- Let \(X\) denote a random variable.
- The notation \((X = 1)\) or \((X \in E)\) is invalid in set theory.
- Therefore, probabilists “gave it a meaning” as follows:

\[(X = 1) = \{s : X(s) = 1\}.\]

**Example:** Consider \(X(s) = 1\) if \(s\) is odd, and \(X(s) = 0\) otherwise. Then \((X = 1)\) corresponds to the red circle.

## Conventions: probability vs Bayesian

**Probability convention:**

- Random variables are denoted with capitals in probability theory
- The same letter in small cap is used for a dummy variable holding the output of the random variable.
- Note: “A dummy variable holding the output of the random variable” is called a
**realization**. - Example: \(X\) for the random variable and \(x\) for its realization.

- Note: “A dummy variable holding the output of the random variable” is called a
- We will start off using this convention in the first few weeks.

**Bayesian statistics convention:**

- Often the capitalization convention is not used in the Bayesian statistics literature.
- Hence we will eventually drop the probability theory capitalization convention.

## More conventions

- \(X\): unobserved random variable (synonym of “unobserved”:
**latent**) - \(Y\): observed random variable

More precisely:

- \(Y\) is the “mechanism of observation”..
- whereas the actual observation is a realization \(y\) of \(Y\).

## Extension

A **random vector** is a function from a sample space to \(\mathbb{R}^n\).

**Example in Bayesian statistics:** the vector \((X, Y)\) containing both the unobserved and observed quantities.