# `3. Random Variables`

1. ~~Random variables and random elements: motivation and examples.~~
2. Transformations of random variables. 
3. ~~Distribution of a random variable.~~
4. Characteristic function and moment-generating function. 
5. Moments and cumulants. 
6. Families of distributions. 
7. Statistics. 
8. Entropy. 
9. Independent random variables, marginals.


### `1. Random variables and random elements: motivation and examples`

### Motivation: why do we even need random variables?
Dealing directly with a huge outcome space $\Omega$ is messy and exhausting. Also, we **don’t measure $\Omega$ directly** — the probability measure $P$ is defined on the **event space** $\mathcal F$, not on individual outcomes.


**`Example 01 (repeated fair die)`:**

If we roll a fair 6-sided die $n$ times, the outcomes are length-$n$ sequences, so:
$$
\Omega = \{\omega_1,\omega_2,\omega_3,\omega_4,\omega_5,\omega_6\}^n,
\qquad |\Omega|=6^n,
$$
and with the biggest possible $\sigma$-algebra:
$$
\mathcal F = 2^\Omega
$$
(which is the power set).

Then the number of measurable events explodes:
$$
|\mathcal F| = 2^{|\Omega|} = 2^{6^n}.
$$
For $n=2$ that is $2^{36}\approx 68$ billion events — not cool.

Even though for a concrete outcome (a concrete sequence) $\omega$ we have:
$$
P(\{\omega\}) = \left(\frac{1}{6}\right)^n = \frac{1}{|\Omega|},
$$
working with such gigantic structures becomes a nightmare for actual calculations.


### “Adjusting the scope”: focusing on what you need
Instead of tracking every tiny detail in $\Omega$, we often want **coarser questions**, like:
1) “Did we roll only even numbers?”  
2) “Did we never roll more than 4?”  
3) “Was the $(n-42)$-th roll a 4 or a 5?”

So we introduce a **helping tool**: a function that extracts only the information we care about.

**Even/odd detector example (the notes’ motivating construction).**  
Set $n=3$ and define a function $h$ that maps each roll to Even/ Odd:
$$
h : \{2,5,4\} \mapsto \{E,O,E\}.
$$
For any “detected pattern” $B\subset\{E,O\}^3$, we can look at the **preimage**:
$$
h^{-1}(B)=\{\text{all original sequences in }\Omega \text{ that produce pattern }B\}.
$$
Then we define the $\sigma$-algebra generated by this detector:
$$
\sigma(h)=\{h^{-1}(B)\mid B\subset\{E,O\}^3\}.
$$
Now we only care about 8 patterns (since $2^3=8$), and for “all even” we get:
$$
P(\{E,E,E\})=\frac{1}{8}.
$$

This is the key idea: **choose the right function / viewpoint → shrink the complexity.**


### Measurability via preimages (why the trick works)
The notes recall the “continuity via preimages” idea and reuse it for measurability: we call a function measurable when the preimage of a measurable set stays measurable.


### **`Def. 2.2`: Random Variable**
Given an outcome (sample) space $(\Omega,\mathcal F)$ and a measurable space $(Y,\mathcal Y)$, an $\mathcal F$-measurable function
$$
X:(\Omega,\mathcal F)\to (Y,\mathcal Y)
$$
is called a **random variable**.

Equivalently (what “$\mathcal F$-measurable” means here):
$$
\forall B\in\mathcal Y:\quad X^{-1}(B)\in\mathcal F.
$$
So events of the form “$X\in B$” are measurable events in the original probability space.

**Common shorthand used in the notes.**  
Often we restrict to $(\mathbb R,\mathcal B(\mathbb R))$ and write lazily:
$$
X:\Omega\to\mathbb R,
$$
while measurability is still the important hidden requirement.

> * **Note**: Random variables help us ”organize” the outcome (sample) space by mapping it to some other space, like a real line.

### **`Def. 2.2.X`: Random elements** (same definition, more general codomain)
A “random variable” is just a **random element** whose codomain is usually $\mathbb R$.
The notes emphasize that different codomains change what kind of data we model:

- $(\mathbb R,\mathcal B(\mathbb R))$ — one measurement (a single number)
- $(\mathbb R^n,\mathcal B(\mathbb R^n))$ — multiple measurements at once (a vector)
- $(\mathbb R^{m\times n},\mathcal B(\mathbb R^{m\cdot n}))$ — tabular data (rectangles)
- $(S^T,\mathcal C(T,S))$ — function-valued objects with a cylinder $\sigma$-algebra (“borderline stochastic stuff”)  

So: **random variable** = special case; **random element** = same idea for general $Y$.



### “Not so random variables”
The notes point out the classic joke: they are neither truly “random” nor “variables”.
They are **functions** that map outcomes (which can be anything) into something measurable (often numbers), so we can exploit the structure of $\mathbb R$ for calculations.


## `Examples`


### `Discrete Random Variables`:


#### `Example 1: Bernoulli RV`
$$
X:\Omega\to\{0,1\}
$$
with:
- $\Omega=\{\text{Success},\text{Fail}\}$
- $\mathcal F=\{\varnothing,\{\text{Success}\},\{\text{Fail}\},\Omega\}$
- $X(\text{Success})=1,\; X(\text{Fail})=0$

* **Interpretation**: any “hit or miss” / binary classification situation.


#### `Example 2: Binomial RV (count successes)`:

For $n=3$ trials:
$$
X:\Omega\to\{0,1,\dots,n\},\qquad \Omega=\{S,F\}^3,\quad \mathcal F=2^\Omega,
$$
and e.g.
$$
X(SFF)=1,\quad X(SSF)=2,\quad X(SFS)=2.
$$


#### `Example 3: Discrete rank RV`:

Ranks player $A$ in a 4-player match:
$$
X:\Omega\to\{1,2,3,4\},
$$
where $\Omega$ is all $4!$ permutations of $\{A,B,C,D\}$, and $\mathcal F=2^\Omega$.  
Examples:
$$
X(B,A,C,D)=2,\quad X(B,C,A,D)=3,\quad X(B,D,A,C)=3.
$$


### `Continuous Random Variables`:

#### `Example 1: Uniform RV`:
$$
X:\Omega\to[0,1],
\qquad \mathcal F=\{\text{Borel subsets of }[0,1]\},
$$
with example values like $X(0.3)=0.3$, $X(0.734)=0.734$.


#### `Example 2: Exponential RV (waiting time)`:

$$
X:\Omega\to[0,\infty),
\qquad \mathcal F=\{\text{Borel subsets of }[0,\infty)\},
$$
example values like $X(\omega_1)=4.20$, $X(\omega_3)=304$.


#### `Example 3: Normal RV (deviation from mean)`:

$$
X:\Omega\to\mathbb R,
\qquad \mathcal F=\{\text{Borel subsets of }\mathbb R\},
$$
example values like $X(\omega_1)=-5.1$, $X(\omega_2)=0.1$, $X(\omega_3)=2.4$.


## `3. Distribution of a random variable`

### Motivation: we want probabilities of events like `X ∈ B`
Given a probability space $(\Omega,\mathcal F,P)$ and a random variable
$$
X:\Omega \to \mathbb R,
$$
we often care about events of the form:
$$
\{\omega\in\Omega: X(\omega)\in B\} = X^{-1}(B),
$$
where $B$ is a (Borel) set in $\mathbb R$.
This event is measurable (belongs to $\mathcal F$), so it has a probability.

> * **`Def.`: Borel Set**: A Borel set is any set in the Borel $\sigma$-algebra $\mathcal B(\mathbb R)$, which is generated by all open intervals in $\mathbb R$ through countable unions, intersections, and complements.



### **`Def. 2.3`: Distribution (law) of a Random Variable**
Given a probability space $(\Omega,\mathcal F,P)$, a measurable space $(\mathbb Y,\mathcal Y)$, and a random variable $X$, its distribution (law) is defined as a **pushforward probability measure** $P_X$

**Meaning**:

$$
\forall B\in(\mathbb Y,\mathcal Y) \text{, we have } P_X(B) = P(X\in B)=P(X^{-1}(B)).
$$


> * **Important note**: having the same distribution does **not** mean two random variables are equal as functions.


## `Distributions and Probability Mass Functions (PMFs)`

* There are 2 main types of distributions: ***discrete*** and ***continuous***.


## `3.1 Discrete distributions`

#### **`Def. 3.2.1`: Discrete random variable**

A random variable $X$ is said to be **discrete** if it takes values in a finite or countably infinite set
$$
\{a_1,a_2,\dots\}
\quad\text{such that}\quad
P(X=a_j\text{ for some }j)=1.
$$

If $X$ is a discrete random variable, then this finite or countable set of values such that $P(X = x) > 0$ is called the **support** of $X$.


> * **Note**: ***Continuous Random Variables*** can take any value in an interval.


---

* The *distribution* of a **Random Variable** specifies the probabilities of all events addociated with it. For a discrete random variable, this is captured by the **Probability Mass Function (PMF)**.

---


#### **`Def. 3.2.2`: Probability Mass Function**

The **probability mass function (PMF)** of a discrete random variable $X$ is the function $p_X$ given by:
$$
p_X(x)=P(X=x).
$$
It is $>0$ on the support of $X$ and $0$ otherwise.

>* **Notes**: In writing $P(X=x)$, we mean the probability of the **event** $\{\omega\in\Omega: X(\omega)=x\}$.

---

#### **`Thm. 3.2.7`: Valid PMFs**
Let $X$ be a discrete random variable with support $\{x_1,x_2,\dots\}$. The *Probability Mass Function* (PMF) $p_X$ of $X$ must satisfy:

1. **Non-negativity**: 
$$p_X(x)\ge 0 \text{ for all } x = x_j, \text{ for some } j. \quad p_X(x) = 0 \text{ otherwise.}$$

2. **Normalization**: 
$$
\sum_{j=1}^\infty p_X(x_j)=1
$$

---


## `3.2 Cumulative Distribution Functions (CDF)` 

* Works for all random variables.

---

#### **`Def. 3.6.1`: Cumulative distribution function (CDF)**

The **Cumulative distribution function** (CDF) of a random variable $X$ is the function $F_X$ given by:
$$
F_X(x)=P(X\le x).
$$

>* **Note**: Only *discrete random variables* have **Probability Mass Functions** (PMFs), but **all random variables have CDFs**.

---

#### **`Thm. 3.6.3`: Valid CDFs**
Any Cumulative Distribution Function (CDF) $F$ has these properties:

1) **Increasing**:
$$
\text{If } x_1\le x_2 \text{ then } F(x_1)\le F(x_2)
$$

2) **Right-continuous**: The CFD is **Continuous**, except for having some jumps. At the point of a jump, the CDF is continuous from the **right** $\forall a\in\mathbb R$:
$$
F(a)=\lim_{x\to a^+}F(x)
$$

3) **Convergence to $0$ and $1$**:
$$
\lim_{x\to-\infty}F(x)=0 \text{   and   } \lim_{x\to\infty}F(x)=1
$$

4) **Normalization**:
$$
\forall x\in\mathbb R:\quad 0\le F(x)\le 1 \text{- Range is bounded}
$$

---

## `3.3 Relationship between PMFs and CDFs`:

For **discrete random variables**, we can easily convert between PMFs and CDFs:

1. **From PMF to CDF**:
- To find, for example, $P(X\le x_0)$, (where $x_0$ is some real number), we sum the PMF values for all support points $x_j$ that are $\le x_0$:

$$
F_X(x) = P(X\le x) = \sum_{x_j \le x} p_X(x_j)
$$

2. **From CDF to PMF**:
- The CDF of a discrete random variable consists of jumps and flat regions. The ***Height of a jump at x*** = ***Value of the PMF at $x_j$***:
$$
p_X(x_j) = F_X(x_j) - \lim_{x\to x_j^-} F_X(x)
$$





# `Second Lector`:

---

### **`Def. 2.2` Random Variable**:

Given a sample space $(\Omega,\mathcal{F})$ and a measurable space $(Y,\mathcal{Y})$, a function $\mathcal{F}$-measurable function $X:(\Omega,\mathcal{F})\to(Y,\mathcal{Y})$ is called a **random variable**.

---

### **`Def. 2.3` Distribution of a Random Variable**:

Given a probability space $(\Omega,\mathcal{F},P)$, a measurable space $(\mathbb{Y},\mathcal{Y})$, and a random variable $X$, its distribution (law) is defined as a **pushforward probability measure** $P_X$ on $(\mathbb{Y},\mathcal{Y})$ by:
$$\forall B\in\mathcal{Y}:\quad P_X(B) = P(X\in B) = P(X^{-1}(B)).$$

---

### **`Def. 2.4` Cumulative Distribution Function (CFD)**

For all given probability space $(\Omega,\mathcal{F},P)$ and random variable $X$ together with its distribution $P_X$ and a number $x\in\mathbb{R}$, the **Cumulative Distribution Function** $F_X :\mathbb{R}\to[0,1]$ is defined as:

$$
F_X(x) = P(X^{-1}(-\infty,x]) = P_X((-\infty,x]) 
$$

Which satisfies:

1. **Non-decreasing**: 
$$
\text{if } x_1\le x_2 \text{ then } F_X(x_1)\le F_X(x_2)
$$

2. **Right-continuous**:
$$
F_X(a) = \lim_{x\to a^+} F_X(x)
$$

3. **Limits at infinity**:
$$
\lim_{x\to -\infty} F_X(x) = 0 \quad\text{and}\quad \lim_{x\to +\infty} F_X(x) = 1
$$

4. **Normalization**:
$$
0 \le F_X(x) \le 1 \text{ for all } x\in\mathbb{R}
$$

---

### **`Def. 2.5` Probability Mass Function (PMF)**:

Given a probability space $(\Omega,\mathcal{F},P)$ and a discrete random variable $X$ taking values on a countable set $S=\{x_1,x_2,\dots\}$, the **Probability Mass Function (PMF)** of $X$ is a function $p_X:\mathbb{R}\to[0,1]$ is defined as:

$$
p_X(x) = P(X=x) \text{ for all } x\in S
$$

With the properties:

1. **Non-negativity**: 
$$
p_X(x) \ge 0 \text{ for all } x\in S
$$

2. **Normalization**:
$$
\sum_{x\in S} p_X(x) = 1
$$

3. **Zero outside support**:
$$
p_X(x) = 0 \text{ if } x\notin S
$$

> **Note**: Reconvering CDF: $$F_X(x) = \sum_{x_j\le x} p_X(x_j)$$

---

### `Def. 2.6`Probability Density Function (PDF):

Given a probability space $(\Omega,\mathcal{F},P)$ and a continuous real valued random variable $X$ with distribution $P_X \le \lambda$ and a $F_X$ for CDF, a Probability Density Function (PDF)** is a function $f_X:\mathbb{R}\to[0,\infty)$ where:

1. 
$$
f_X(x) \ge 0 \text{ for all } x\in\mathbb{R}
$$

2. 
$$
\int_{-\infty}^{\infty} f_X(x) d(x) = 1
$$


3.
$$
P(X\in B) = \int_B f_X(x) d(x) 
$$

