# `2. Integrals and Expectation`:


1. ~~Indicator functions.~~
2. ~~Simple functions.~~
3. ~~Measurable functions.~~
4. Lebesgue integral and its properties. 
5. L^p spaces. 
6. Integration with respect to a kernel. 
7. Expectation as a functional (discrete and continuous cases). 
8. Connection with distribution moments.


## `1. Indicator functions`

### **What is it / why do we care?**

* **What is it**: An **indicator function** is the **simplest possible measurable function**. It only answers a **yes/no** question:
> ***"Did the outcome ω belong to the event A?"***

* **Why we care**: once we can convert **sets (events)** into **functions**, we can build more complicated random variables and then integrate them (compute averages/expectations). 


### **`Def. 3.1`: An indicator function**
Given a probability space $(\Omega,\mathcal F,P)$ and an event $A\in\mathcal F$, the indicator
$$
\mathbf 1_A:\Omega\to\{0,1\}
$$
is defined by
$$
\mathbf 1_A(\omega)=
\begin{cases}
1,&\omega\in A\\
0,&\omega\notin A.
\end{cases}
$$

* This way $1_A$ turns to be a random variable that only distinguishes “in A” vs “not in A”.

### `Properties of indicator functions`:

1) Complements:
$$
\mathbf 1_{A^c} = 1 - \mathbf 1_A
$$

2) Intersections:
$$
\mathbf 1_{A\cap B} = \mathbf 1_A\cdot \mathbf 1_B
$$

3) Unions:
$$
\mathbf 1_{A\cup B} = \mathbf 1_A + \mathbf 1_B - \mathbf 1_A\mathbf 1_B
$$ 

4) Monotonicity:
$$
\text{If } A\subseteq B, \text{ then for all } \omega \in \Omega:
$$
$$
\mathbf 1_A(\omega)\le \mathbf 1_B(\omega).
$$

## `2. Simple functions`

### **What is it / why do we care?**

* **What is it**: A **simple function** is basically a “categorizer”: it splits outcomes into finitely many buckets, and assigns a constant value to each bucket.
This is the next building block after indicators.

* **Why we care**: simple functions are **measurable by construction** and are the starting point for defining the Lebesgue integral.


### **`Def. 3.2`: Simple function**
Given:
1. A measure space $(\Omega,\mathcal F,\mu)$;
2. A measurable function $\varphi:\Omega\to\mathbb R$ with **finite range**

Then this measurable function is a simple function if there exist real numbers $a_1,\dots,a_n \in \mathbb R$ and sets
$A_1,\dots,A_n\in\mathcal F$ such that:

$$
\Omega=\bigcup_{i=1}^n A_i,\qquad A_i\cap A_j=\varnothing\ \text{ if } (i\ne j),
$$
and
$$
\varphi(\omega)=\sum_{i=1}^n a_i\,\mathbf 1_{A_i}(\omega).
$$

> * **Note**: The slides emphasize: this is a “sophisticated way of saying” *finite linear combinations of indicators*, and we do it to preserve measurability.


## `3. Measurable functions`

### What is it / why do we care?

* **What is it**:
A **measurable function** is a function that **preserves measurability** when mapping from one measurable space to another.

* **Why we care**:
“Measurable” means: **your function doesn’t create non-measurable weirdness**.
It guarantees that events like “$f$ lands in a measurable set” are still measurable back in the domain.


### **`Def. 1.5`: Measurable function**
A function $f$ between measurable spaces $(X,\mathcal F)$ and $(Y,\mathcal G)$ is measurable if:
$$
\forall\,G\in\mathcal G:\quad f^{-1}(G)\in\mathcal F.
$$
(Preimages of measurable sets are measurable.)

### **“Preserving measurability” (constructive checklist)**:

1) Linear combinations:
$$
x\,\mathbf 1_A + y\,\mathbf 1_B
$$

2) Sums/products (including countable sums/products in general settings):
$$
\mathbf 1_A \pm \mathbf 1_B,\qquad \mathbf 1_A\cdot \mathbf 1_B
$$

3) Max/min, sup/inf:
$$
\max(\mathbf 1_A,\mathbf 1_B),\quad \sup_{n\ge 1}\mathbf 1_{A_n}
$$

4) Pointwise limits:
$$
\lim_{n\to\infty}\mathbf 1_{A_n}
$$

5) Compositions:
$$
(\mathbf 1_B\circ \mathbf 1_A)(\omega)
$$

**Why this matters (super practical):**
- If you start with measurable “building blocks” and only use these operations, you never accidentally leave the measurable world.




## `4. Lebesgue integral and its properties`

### What is it / why do we care?
The slides’ motivation: probability gets “wild” (mixed distributions, non-standard measures), so we want **one integral that behaves well** and doesn’t rely on handwavy assumptions.  
Key picture: an integral is a **weighted sum** of measured pieces.



### Start with indicators (the first building block)
The slides “postulate”:
$$
\int_X \mathbf 1_A\,d\mu = \mu(A).
$$
So integrating an indicator just measures the set directly.

### Integrate simple functions
For a nonnegative simple function $\varphi=\sum_{i=1}^n a_i\mathbf 1_{A_i}$:
$$
\int_X \varphi\,d\mu = \sum_{i=1}^n a_i\,\mu(A_i).
$$

### **`Thm. 3.1`: Approximation by Simple Functions**

For any nonnegative measurable $f:X\to[0,\infty]$, there exists a sequence of nonnegative simple functions $(\varphi_n)$ such that:
1) $0\le \varphi_1\le \varphi_2\le \cdots \le f$  
2) $\varphi_n(x)\to f(x)$ pointwise.


### **`Def. 2.3`: Lebesgue integral (nonnegative case)**

For nonnegative measurable $f$:
$$
\int_X f\,d\mu := \sup\left\{\int_X \varphi\,d\mu:\ 0\le \varphi\le f,\ \varphi\ \text{simple}\right\}.
$$
Equivalently, if $\varphi_n\uparrow f$ pointwise:
$$
\int_X f\,d\mu = \lim_{n\to\infty}\int_X \varphi_n\,d\mu.
$$


### Extend to general (signed) functions
For measurable $f:X\to\mathbb R$:
$$
f=f^+-f^-,
\qquad
f^+(x)=\max(f(x),0),\quad f^-(x)=\max(-f(x),0),
$$
and define
$$
\int_X f\,d\mu := \int_X f^+\,d\mu - \int_X f^-\,d\mu.
$$

### Integrability criterion (when is it finite?)
The slides state:
$$
\int_X |f|\,d\mu = \int_X f^+\,d\mu + \int_X f^-\,d\mu < \infty.
$$

### Properties (slides list)
For integrable $f,g$, sets $A,B\in\mathcal F$, scalars $a,b\in\mathbb R$:
1) Linearity  
2) Monotonicity  
3) σ-additivity over disjoint sets  
4) Absolute value inequality  
5) Zero-measure sets integrate to 0

### Tiny example (why this is useful)
If $f=\mathbf 1_A$, then
$$
\int_X f\,d\mu=\mu(A).
$$
So “probability of an event” is literally an integral once $P$ is your measure.




## `5. L^p spaces`

### What is it / why do we care?
We want a **single “home”** for measurable/integrable functions, where we can do vector-space style reasoning (norms, distances, convergence).

### **`Def. 3.3`: Lebesgue space**
Given $(X,\mathcal F,\mu)$ and $1\le p\le\infty$:
$$
\mathcal L^p(X,\mu)=\{f:X\to\mathbb R(\mathbb C)\mid \|f\|_p<\infty\},
$$
with
$$
\|f\|_p=
\begin{cases}
\left(\int_X |f(x)|^p\,d\mu(x)\right)^{1/p},&1\le p<\infty\\[6pt]
\operatorname*{ess\,sup}_{x\in X}|f(x)|,&p=\infty.
\end{cases}
$$
To make it a proper vector space, identify functions equal a.e.:
$$
L^p(X,\mu)=\mathcal L^p(X,\mu)/\sim.
$$

### Tiny example

- $L^1$ = finite “area under |f|” (integrable)
- $L^2$ = finite “energy” (variance and distances live here — the slides connect variance to an $L^2$ distance)


## `6. Integration with respect to a kernel`

### What is it / why do we care?
A kernel integral is like **matrix multiplication for functions**:
a matrix takes a vector \(f\) and outputs a vector \(g\).  
A kernel takes a function \(f(y)\) and outputs a new function \(g(x)\) by “mixing” values of \(f\) with weights \(K(x,y)\).

### **`Def. 4.1`: Integration Against a Kernel**
Given measure spaces $(Y,\mathcal F,\mu)$ and $(X,\mathcal G,\nu)$ and a measurable kernel
$$
K:X\times Y\to\mathbb R(\mathbb C),
$$
define an integral operator
$$
T:L^p(Y,\mu)\to L^q(X,\nu),
\qquad
(Tf)(x)=\int_Y K(x,y)f(y)\,d\mu(y).
$$



### Tiny example (Fourier kernel from slides)
With
$$
K(\xi,y)=e^{-2\pi i \xi y},
$$
the operator becomes the Fourier transform:
$$
\hat f(\xi)=\int_{\mathbb R} e^{-2\pi i \xi y}\,f(y)\,dy.
$$
Meaning: you “project” a function onto waves to read its frequency content.


## `7. Expectation as a functional (discrete and continuous cases)`

### What is it / why do we care?
Probabilities tell you “how likely events are”. Expectation tells you “what’s the average outcome if you weigh all outcomes by probability”.  
Slides’ punchline: **Expectation is just an integral**.


### **`Def. 3.6`: Expectation**
Given a probability space $(\Omega,\mathcal F,P)$ and a random variable $X$,
expectation is a bounded continuous linear functional $\mathbb E:L^1\to\mathbb R$:
$$
\mathbb E[X]=\int_\Omega X(\omega)\,dP(\omega)=\int X\,dP.
$$


### Properties (slides list)
- Linearity: $\mathbb E[\alpha X+\beta Y]=\alpha\mathbb E[X]+\beta\mathbb E[Y]$
- Non-negativity for $X\ge 0$
- Monotonicity
- Constants: $\mathbb E[c]=c$
- Triangle inequality: $|\mathbb E[X]|\le \mathbb E[|X|]$


### Tiny example (discrete)
Fair die, $P(X=x)=1/6$:
$$
\mathbb E[X]=\sum_{x=1}^6 x\cdot\frac16=3.5.
$$


## `8. Connection with distribution moments`

### What is it / why do we care?
“Moments” are expectations of powers:
- mean = first moment
- energy/second moment helps define variance
They summarize shape: center, spread, tail heaviness.


### **`Thm. 3.6`: LOTUS (Law of the Unconscious Statistician)**
Given an integrable r.v. $X$, a measurable function $g(X)\in L^1(P)$, and the pushforward measure $P_X$ on $\mathbb R$:
$$
\mathbb E[g(X)] = \int_{\mathbb R} g(x)\,dP_X(x).
$$
Slides’ meaning: you can compute expectations using the distribution of $X$, without finding the distribution of $g(X)$.

**Discrete version (slides):**
$$
\mathbb E[g(X)] = \sum_i g(x_i)\,P(X=x_i)=\sum_i g(x_i)p_X(x_i).
$$

**Continuous version (slides):**
$$
\mathbb E[g(X)] = \int_{\mathbb R} g(x)\,f_X(x)\,dx.
$$


### Variance connection (slides)
If $\mathbb E[X^2]<\infty$:
$$
\operatorname{Var}(X):=\mathbb E[(X-\mathbb E[X])^2]
=\mathbb E[X^2]-(\mathbb E[X])^2,
$$
and the slides interpret it as a “squared distance in $L^2$”.

### Tiny example (continuous, from slides)
If $X\sim\text{Exp}(\lambda=1)$ with $f_X(x)=e^{-x}$ for $x\ge 0$:
$$
\mathbb E[X]=\int_0^\infty x e^{-x}\,dx = 1,\qquad
\mathbb E[X^2]=\int_0^\infty x^2 e^{-x}\,dx = 2,
$$
so
$$
\operatorname{Var}(X)=2-1^2=1.
$$

### Extra link (also in your later slides): moments inside transforms

Your “CF and moments” slides show that the characteristic function expands as
$$
\varphi_X(t)=\mathbb E[e^{itX}]
=\sum_{n=0}^\infty \frac{(it)^n}{n!}\mathbb E[X^n],
$$
so moments are literally encoded in derivatives at $t=0$.

In [None]:
# ============================================================================================================= #
#                                                                                                               #
# ============================================================================================================= #


# `2. Integrals and Expectation`

*(Slides roadmap: constructing functions → Lebesgue integral → L^p spaces → expectation & variance → LOTUS.)*


## `1. Indicator functions`

### What this topic is about (intuition)
An **indicator function** is the simplest “measurement device”: it only answers a yes/no question:
> “Did the outcome land inside the event A?”

Why we care: once we can convert **sets (events)** into **functions**, we can build more complicated random variables and then integrate them (compute averages/expectations). 


### **`Def. 3.1`: Indicator function**
Given a probability space $(\Omega,\mathcal F,P)$ and an event $A\in\mathcal F$, the indicator function
$$
\mathbf 1_A:\Omega\to\{0,1\}
$$
is defined as
$$
\mathbf 1_A(\omega)=
\begin{cases}
1, & \omega\in A,\\
0, & \omega\notin A.
\end{cases}
$$
The slides note: this is a random variable that only distinguishes “in A” vs “not in A”.



### Indicator algebra (set-operations become formulas)
The slides list useful identities:

1) Complements:
$$
\mathbf 1_{A^c} = 1 - \mathbf 1_A
$$

2) Intersections:
$$
\mathbf 1_{A\cap B} = \mathbf 1_A\cdot \mathbf 1_B
$$

3) Unions:
$$
\mathbf 1_{A\cup B} = \mathbf 1_A + \mathbf 1_B - \mathbf 1_A\mathbf 1_B
$$ 

### Monotonicity (a very “measure-like” behavior)
If $A\subseteq B$, then for all $\omega$:
$$
\mathbf 1_A(\omega)\le \mathbf 1_B(\omega).
$$

### Tiny dummy example
Coin toss, $\Omega=\{H,T\}$. Let $A=\{H\}$. Then:
- $\mathbf 1_A(H)=1$, $\mathbf 1_A(T)=0$.
- If $B=\Omega$, then $A\subseteq B$ and indeed $\mathbf 1_A(\omega)\le \mathbf 1_B(\omega)=1$ always.

## `2. Simple functions`

### What this topic is about (intuition)
A **simple function** is basically a “categorizer”: it splits outcomes into finitely many buckets, and assigns a constant value to each bucket.
This is the next building block after indicators.

Why we care: simple functions are **measurable by construction** and are the starting point for defining the Lebesgue integral.


### **`Def. 3.2`: Simple function**
Given a measure space $(\Omega,\mathcal F,\mu)$, a measurable function $\varphi:\Omega\to\mathbb R$ with a **finite range** is called a simple function.

Equivalently, there exist finitely many real numbers $a_1,\dots,a_n$ and measurable sets $A_1,\dots,A_n\in\mathcal F$ such that:

1) $(A_i)$ form a partition:
$$
\Omega=\bigcup_{i=1}^n A_i,\qquad A_i\cap A_j=\varnothing\ (i\ne j)
$$

2) and for every $\omega\in\Omega$:
$$
\varphi(\omega)=\sum_{i=1}^n a_i\,\mathbf 1_{A_i}(\omega).
$$

The slides emphasize: this is a “sophisticated way of saying” *finite linear combinations of indicators*, and we do it to preserve measurability.

### Dummy example (from the slides): “# of heads in 2 coin flips”
$\Omega=\{HH,HT,TH,TT\}$, uniform probability.

Define $X(\omega)$ = number of heads:
$$
X(\omega)=
\begin{cases}
2,& \omega=HH\\
1,& \omega\in\{HT,TH\}\\
0,& \omega=TT
\end{cases}
$$

Indicator form:
$$
X(\omega)=2\,\mathbf 1_{\{HH\}}(\omega)+1\,\mathbf 1_{\{HT,TH\}}(\omega)+0\,\mathbf 1_{\{TT\}}(\omega).
$$
Support is $\{0,1,2\}$.


## `3. Measurable functions`

### What this topic is about (intuition)
A function is “legal” for measure-theoretic probability only if it is **measurable**.
Measurability is what guarantees that statements like
> “$X$ lands in some set $B$”
are actual events in $\mathcal F$ and therefore have probabilities.

The slides stress the “constructive realm”: in engineering/numerical work, we want rules that **guarantee** measurability when we build objects.  

### Measurability via indicator functions
The slides note:
- $\mathbf 1_A$ is measurable if $A\in\mathcal F$.
- Also, you can recover the set from the indicator:
$$
\mathbf 1_A^{-1}(\{1\}) = A.
$$
So the measurability of $\mathbf 1_A$ is tied directly to $A$ being measurable.  

### Preserving measurability (construction rules)
The slides list operations that preserve measurability (shown using indicators for clarity, but the idea is general):

1) Linear combinations:
$$
x\,\mathbf 1_A + y\,\mathbf 1_B
$$

2) Sums/products (including countable sums/products in general settings):
$$
\mathbf 1_A \pm \mathbf 1_B,\qquad \mathbf 1_A\cdot \mathbf 1_B
$$

3) Max/min, sup/inf:
$$
\max(\mathbf 1_A,\mathbf 1_B),\quad \sup_{n\ge 1}\mathbf 1_{A_n}
$$

4) Pointwise limits:
$$
\lim_{n\to\infty}\mathbf 1_{A_n}
$$

5) Compositions:
$$
(\mathbf 1_B\circ \mathbf 1_A)(\omega)
$$

**Why this matters (super practical):**
- If you start with measurable “building blocks” and only use these operations, you never accidentally leave the measurable world.


## `4. Lebesgue integral and its properties`

### What this topic is about (intuition)
The slides motivate Lebesgue integration as the “tool that never fails”:
- Probability introduces weird objects (mixed distributions, non-standard measures).
- We want one unified approach that stays consistent and usable.  

Key mental model from the slides:
> An integral is a **weighted sum**: measure pieces, multiply by values, add up.


### Start from indicators (the first postulate)
**Integrating indicators** is defined by:
$$
\int_X \mathbf 1_A\,d\mu = \mu(A).
$$
So integrating “membership” just returns “how big the set is”.  

### Integrating simple functions
For a non-negative simple function
$$
\varphi=\sum_{i=1}^n a_i\mathbf 1_{A_i},\qquad a_i\ge 0,
$$
linearity gives:
$$
\int_X \varphi\,d\mu = \sum_{i=1}^n a_i\,\mu(A_i).
$$

### **`Thm. 3.1`: Approximation by Simple Functions**
Given a measure space $(X,\mathcal F,\mu)$ and a non-negative measurable function $f:X\to[0,\infty]$, there exists a sequence of non-negative simple functions $(\varphi_n)$ such that:

1) Monotone increase:
$$
0\le \varphi_1\le \varphi_2\le \cdots \le f
$$

2) Pointwise convergence:
$$
\lim_{n\to\infty}\varphi_n(x)=f(x)\quad \forall x\in X.
$$

**Intuition:** approximate complicated $f$ from below by step-functions, then define the integral as the limit of their integrals.


### **`Def. 2.3`: Lebesgue integral (non-negative case)**
For $f:X\to[0,\infty]$ measurable, the Lebesgue integral is defined by:
$$
\int_X f\,d\mu := \sup\left\{\int_X \varphi\,d\mu:\ 0\le \varphi\le f,\ \varphi\ \text{simple}\right\}.
$$

Equivalently, if $(\varphi_n)$ is increasing simple functions converging pointwise to $f$:
$$
\int_X f\,d\mu = \lim_{n\to\infty}\int_X \varphi_n\,d\mu.
$$

### Extending to general real-valued measurable functions
For measurable $f:X\to\mathbb R$, split it into:
$$
f = f^+ - f^-,
\qquad
f^+(x)=\max(f(x),0),
\qquad
f^-(x)=\max(-f(x),0).
$$
Then define:
$$
\int_X f\,d\mu := \int_X f^+\,d\mu - \int_X f^-\,d\mu.
$$

### Integrability criterion (when the integral is finite)
A measurable function $f$ is Lebesgue integrable if:
$$
\int_X |f|\,d\mu
=
\int_X f^+\,d\mu + \int_X f^-\,d\mu
<\infty.
$$
The slides stress: yes, it’s a “plus” — both sides must be finite.



### Properties of the Lebesgue integral (listed)
For integrable real-valued $f,g$ on $(X,\mathcal F,\mu)$ and scalars $a,b\in\mathbb R$:

1) Linearity:
$$
\int_X(af+bg)\,d\mu = a\int_X f\,d\mu + b\int_X g\,d\mu
$$

2) Monotonicity:
$$
f\le g \implies \int_X f\,d\mu \le \int_X g\,d\mu
$$

3) Additivity over disjoint sets:
if $A\cap B=\varnothing$ then
$$
\int_{A\cup B} f\,d\mu = \int_A f\,d\mu + \int_B f\,d\mu
$$

4) Absolute value inequality:
$$
\left|\int_X f\,d\mu\right|\le \int_X |f|\,d\mu
$$

5) Zero-measure sets:
if $\mu(A)=0$, then
$$
\int_A f\,d\mu = 0.
$$

---

### Tiny dummy example (“weighted sum” feel)
If $f = 3\mathbf 1_A + 10\mathbf 1_B$ with $A\cap B=\varnothing$, then:
$$
\int f\,d\mu = 3\mu(A)+10\mu(B).
$$
You are literally summing “value × size-of-region”.

---






## `5. L^p spaces`

### What this topic is about (intuition)
Once you can integrate $|f|^p$, you can measure the “size” of a function in a stable way.
That creates a geometry of functions:
- $L^1$ controls absolute integrability (expectations exist),
- $L^2$ gives a Euclidean-like structure (variance is distance squared),
- $L^\infty$ is “bounded almost everywhere”.

The slides’ motivation: get one unifying structure where functions behave like vectors (so we can use linear algebra intuition).  [oai_citation:21‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### **`Def. 3.3`: Lebesgue space**
Given a measure space $(X,\mathcal F,\mu)$ and $1\le p\le \infty$, define:
$$
\mathcal L^p(X,\mu) := \{f:X\to\mathbb R\ (\text{or }\mathbb C)\mid \|f\|_p<\infty\}
$$
with the $L^p$ “quasi-norm”:
$$
\|f\|_p :=
\begin{cases}
\left(\int_X |f(x)|^p\,d\mu(x)\right)^{1/p}, & 1\le p<\infty,\\[6pt]
\operatorname{ess\,sup}_{x\in X}|f(x)|, & p=\infty.
\end{cases}
$$
Then define the actual Lebesgue space by factorizing by “equal a.e.”:
$$
L^p(X,\mu):=\mathcal L^p(X,\mu)/\sim,
\quad
f\sim g \iff f(x)=g(x)\ \text{a.e.}
$$  [oai_citation:22‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### Why do we quotient by “almost everywhere equal”?
The slides say: mostly for algebraic reasons — it makes the space a proper vector space, and the norm doesn’t care about changing values on a set of measure zero anyway.  [oai_citation:23‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### Vector space viewpoint
In $L^p$:
- $f+g$ is still in $L^p$ (closure),
- $\alpha f$ is still in $L^p$,
- you get the full list of vector space axioms (slides show a table).  [oai_citation:24‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### Tiny dummy example
On $[0,1]$ with Lebesgue measure:
- $f(x)=1$ is in every $L^p$.
- $f(x)=\frac{1}{\sqrt{x}}$ is in $L^1$? (integral diverges) but in some other ranges it might be in $L^p$ or not — this is the kind of “membership question” $L^p$ is designed for.

---

## `6. Integration with respect to a kernel`

### What this topic is about (intuition)
Sometimes integration is not just “add up $f$ over $x$”.
Sometimes you want to **transform** a function into another function:
- smoothing,
- extracting frequencies,
- modeling time evolution,
- “projecting” onto basis functions.

A kernel is the rule that says:
> “How much does the value at $y$ contribute to the output at $x$?”  [oai_citation:25‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

### **`Def. 4.1`: Integration Against a Kernel**
Given measure spaces $(Y,\mathcal F,\mu)$ and $(X,\mathcal G,\nu)$ and a measurable function
$$
K:X\times Y \to \mathbb R\ (\mathbb C)
$$
called a kernel, define the integral operator:
$$
T:L^p(Y,\mu)\to L^q(X,\nu),
\qquad
(Tf)(x)=\int_Y K(x,y)f(y)\,d\mu(y).
$$  [oai_citation:26‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

### Kernel as “weighting machine” (slides wording + extra clarity)
- Fix $x$ (output location).
- Look at all input points $y$.
- Multiply $f(y)$ by $K(x,y)$ and add/integrate.
So the kernel shapes the output: not just weights, but structure (oscillation, smoothness, etc.).  [oai_citation:27‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

### Example: Fourier kernel (frequency extraction)
Slides define:
$$
K(\xi,y)=e^{-2\pi i \xi y}.
$$
Then
$$
\hat f(\xi) = \int_{\mathbb R} e^{-2\pi i\xi y}f(y)\,dy
= \int_{\mathbb R} K(\xi,y)f(y)\,dy,
$$
so Fourier transform is an integral operator with a kernel.  [oai_citation:28‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

**Tiny intuition:** if your $f$ is a signal in time, $\hat f(\xi)$ tells how much “wave of frequency $\xi$” is inside it.

---

### Example: Laplace kernel (damped / growth-aware transform)
Slides show Laplace kernel like $e^{-sx}$ (with $s\in\mathbb C$), which damps the tail and helps convergence.  [oai_citation:29‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

## `7. Expectation as a functional (discrete and continuous cases)`

### What this topic is about (intuition)
Probability tells you “how likely events are”.
Expectation tells you “what happens on average when you combine *all* outcomes”.

The slides phrase this as:
> “probabilities are awesome, but what if we need to know something about all outcomes at once?”  [oai_citation:30‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### **`Def. 3.6`: Expectation**
Given a probability space $(\Omega,\mathcal F,P)$ and a random variable $X$, expectation is a bounded continuous linear functional
$$
\mathbb E:L^1\to\mathbb R
$$
defined by:
$$
\mathbb E[X] = \int_\Omega X(\omega)\,dP(\omega)=\int X\,dP.
$$  [oai_citation:31‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

**Key sentence (slides):** Expectation is “just an integral!”  [oai_citation:32‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### Expectation properties (listed)
1) Linearity:
$$
\mathbb E[\alpha X+\beta Y]=\alpha\mathbb E[X]+\beta\mathbb E[Y]
$$
2) Non-negativity: if $X\ge0$ a.s., then $\mathbb E[X]\ge 0$  
3) Monotonicity: if $X\ge Y$ a.s., then $\mathbb E[X]\ge \mathbb E[Y]$  
4) Constants: $\mathbb E[c]=c$  
5) Triangle inequality:
$$
|\mathbb E[X]|\le \mathbb E[|X|]
$$  [oai_citation:33‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### Discrete vs continuous formulas (LOTUS idea)
The slides emphasize “we keep the old probability weights” but apply them to new values.  [oai_citation:34‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

#### Discrete case
If $X$ takes values $x_k$ with probabilities $p_k$:
$$
\mathbb E[g(X)] = \sum_k g(x_k)\,p_k.
$$  [oai_citation:35‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

#### Continuous case
If $X$ has density $f_X(x)$:
$$
\mathbb E[g(X)] = \int_{\mathbb R} g(x)\,f_X(x)\,dx.
$$  [oai_citation:36‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### Dummy example 1 (from slides): die + transform
Fair die $X\in\{1,\dots,6\}$, $P(X=x)=1/6$.
Take $g(x)=x^2$:
$$
\mathbb E[X^2] = \sum_{x=1}^6 x^2\cdot \frac{1}{6}
= \frac{1^2+2^2+3^2+4^2+5^2+6^2}{6}\approx 15.17.
$$  [oai_citation:37‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

**Why this matters:** you can study a new variable $g(X)$ without re-deriving a new distribution from scratch.

---

### Dummy example 2 (from slides): exponential lifetime
$X\sim \text{Exp}(\lambda=1)$, density $f_X(x)=e^{-x}$ for $x\ge0$:
$$
\mathbb E[X]=\int_0^\infty x e^{-x}\,dx=1,
\qquad
\mathbb E[X^2]=\int_0^\infty x^2 e^{-x}\,dx=2.
$$  [oai_citation:38‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

### Variance (as preview in the slides)
If $\mathbb E[X^2]<\infty$, variance is:
$$
\operatorname{Var}(X):=\mathbb E\big[(X-\mathbb E[X])^2\big].
$$
Slides also give the useful identity:
$$
\operatorname{Var}(X)=\mathbb E[X^2]-(\mathbb E[X])^2,
$$
and interpret it as a “squared distance in $L^2$”, since $X\in L^2$.  [oai_citation:39‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)

---

## `8. Connection with distribution moments`

### What this topic is about (intuition)
A **moment** is just an expectation of a power:
- first moment = mean,
- second moment helps with variance,
- higher moments describe tails and shape.

So “moments” are a direct bridge between:
- the **distribution** of $X$,
- and measurable summaries you can compute via integrals.

---

### Moments as expectations (raw and central)
The slides define moments through the MGF derivatives:

### **`Def. 4.3`: Moment Generating Function (MGF)**
Given $X$ with induced distribution $P_X$, the MGF is:
$$
M_X(t)=\mathbb E[e^{tX}]
= \int_{\mathbb R} e^{tx}\,dP_X(x).
$$  [oai_citation:40‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

### **`Def. 4.3`: Moments of a Distribution (via derivatives at 0)**
For $n\in\mathbb N$, the **n-th raw moment** is:
$$
m_n := \mathbb E[X^n] = M_X^{(n)}(0).
$$
The **n-th central moment** around the mean $\mu$ is:
$$
\mu_n := \mathbb E[(X-\mu)^n]
= \sum_{k=0}^n \binom{n}{k}(-\mu)^{n-k}m_k.
$$
Slides interpret this as “changing the reference point” (origin → mean).  [oai_citation:41‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

### First moment = mean (slides)
$$
\mu = m_1 = \mathbb E[X] = M_X'(0).
$$
Slides give an example with Pareto showing how the mean depends heavily on tail weight.  [oai_citation:42‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

### Why MGFs/CFs are useful for moments (big picture)
- The kernel $e^{tX}$ is **unbounded**, so the MGF “sees growth” and tail behavior strongly (slides emphasize this).  [oai_citation:43‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)
- Derivatives at $0$ extract moments cleanly:
$$
M_X^{(n)}(0)=\mathbb E[X^n].
$$  [oai_citation:44‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)

---

### Tiny dummy example (moment as “summary of distribution”)
If $X$ is “daily delivery delay (minutes)”:
- $\mathbb E[X]$ tells average delay.
- $\mathbb E[(X-\mu)^2]$ tells how inconsistent the delays are (spread).
- higher moments detect “how often extreme delays happen”.

---

## `End-of-block summary (one sentence each)`

- **Indicators:** turn sets into 0/1 random variables (membership detectors).  [oai_citation:45‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)  
- **Simple functions:** finite combinations of indicators (step functions / categories).  [oai_citation:46‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)  
- **Measurable functions:** functions built so that “preimages of measurable sets are events”; construction rules preserve this.  [oai_citation:47‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)  
- **Lebesgue integral:** define integrals via “weighted sums” starting from indicators → simple → limits; it behaves well for probability.  [oai_citation:48‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)  
- **L^p spaces:** classify functions by finiteness of $\int |f|^p$, turning functions into a vector space geometry.  [oai_citation:49‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)  
- **Kernel integration:** integral operators that transform functions (Fourier/Laplace are key examples).  [oai_citation:50‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)  
- **Expectation:** an integral w.r.t. probability; linear functional on $L^1$; variance lives in $L^2$.  [oai_citation:51‡3. Integration and expectation.pdf](sediment://file_000000000f7c722fb8c3ef7df2871072)  
- **Moments:** expectations of powers; MGFs encode them via derivatives at 0.  [oai_citation:52‡4. CF and moments.pdf](sediment://file_00000000941c722f8e0f3e7ffa37f695)  