<img src='https://theaiengineer.dev/tae_logo_gw_flat.png' alt='The Python Quants' width='35%' align='right'>


# Python & Mathematics for Data Science and Machine Learning

**© Dr. Yves J. Hilpisch | The Python Quants GmbH**<br>
AI-powered by GPT-5.



We configure plotting and a reproducible RNG for consistent results.


# Chapter 1 — Why Math With Code (and Code With Math)

This notebook mirrors the first chapter. It demonstrates the ‘Math ↔ Code’ loop: state a claim, test it with a small experiment, and refine intuition from the results.

You’ll Learn

- Set up tiny numerical experiments to verify a claim
- Use seeds and sanity checks for reproducibility
- Translate a statement into code and back into math


In [None]:
# Imports, plotting style, and reproducible RNG
%config InlineBackend.figure_format = 'retina'
import numpy as np  # numerical arrays and linear algebra
import matplotlib.pyplot as plt  # plotting library
plt.style.use('seaborn-v0_8')  # consistent plot style
# Fixed seed for reproducible random numbers across runs
rs = np.random.default_rng(42)  # reproducible random generator


## Law of Large Numbers (LLN): quick numerical check

We expect the sample mean to approach the true mean as sample size grows (under standard assumptions). We’ll probe this behavior numerically.

In [None]:
def lln_demo(dist, n_values=(100, 1_000, 10_000)):  # LLN demo
    """Check how the sample mean approaches the expectation.
    dist: small dict with 'mean' and 'gen'(rng,size)->array
    """
    mu = dist['mean']  # theoretical mean of distribution
    for n in n_values:  # iterate over sample sizes
        x = dist['gen'](rs, size=n)  # draw samples via provided generator
        m = x.mean()  # sample mean
        err = abs(m - mu)  # absolute error to true mean
        print(f"n={n:>6d}  mean={m:+.6f}  |mean-mu|={err:.6f}")  # report results
        # Sanity: finite outputs
        assert np.isfinite(m)

# Standard Normal: E[X] = 0
normal = {  # specify standard normal distribution
    'mean': 0.0,  # true mean
    'gen': lambda rng, size: rng.standard_normal(size=size).astype(
        np.float64
    ),  # generator for N(0,1)
}

lln_demo(normal)


## Variance non-negativity as a falsifiable check

We test the identity $\operatorname{Var}(X)=\mathbb{E}[X^2] - (\mathbb{E}[X])^2 \ge 0$ numerically and allow for tiny negative estimates due to rounding (finite samples, `float64`).

In [None]:
def variance_nonneg_demo(rng, n=50_000):  # demo: Var(X) ≥ 0 numerically
    # Draw samples and estimate E[X], E[X^2], and Var~ = E[X^2] - (E[X])^2
    x = rng.standard_normal(size=n).astype(np.float64)  # draw normal samples
    ex = x.mean()  # estimate E[X]
    ex2 = (x * x).mean()  # estimate E[X^2]
    var_est = ex2 - ex * ex  # Var~ = E[X^2] - (E[X])^2
    print(
        f"E[X]={ex:+.4f}, E[X^2]={ex2:+.4f}, Var~={var_est:+.6f}"
    )  # report results
    assert var_est > -1e-12  # allow tiny negatives from rounding

variance_nonneg_demo(rs)


## Visualization: running mean stabilizes with n

We plot the running sample mean $\bar{X}_n$ against sample size on a logarithmic x-axis. Expect large swings for tiny n and gradual stabilization around 0.

In [None]:
N = 20_000  # number of samples
x = rs.standard_normal(size=N).astype(np.float64)  # draw samples for visualization
running_mean = np.cumsum(x) / np.arange(1, N + 1)  # cumulative average $\bar X_n$

fig, ax = plt.subplots(figsize=(6.8, 3.6), dpi=140)  # create figure and axes
ax.plot(  # plot running mean vs. n
    np.arange(1, N + 1),  # x: sample index
    running_mean,          # y: running mean
    color='C0', lw=1.6, label='running mean'  # style
)
ax.axhline(  # reference line at true mean 0
    0.0, color='k', lw=1.0, ls='--',  # style
    label=r'true mean $\mu=0$'
)
ax.set_xscale('log')  # logarithmic x-axis
ax.set_xlabel('n (log scale)')  # label x-axis
ax.set_ylabel(r'sample mean $\bar X_n$')  # label y-axis
ax.set_title('LLN in action: running mean vs. sample size')  # add plot title
ax.legend(loc='best')  # show legend
ax.grid(alpha=0.25)  # light grid for readability
plt.show()  # render the figure


Notes

- Early samples dominate: small n yields large swings.
- Averaging tames noise: typical error shrinks roughly like 1/√n (light tails).
- Randomness persists: different seeds produce different paths with the same pattern.


## Figure Generators (for reproducibility)

- `code/figures/ch01_lln_running_mean.py` — running mean (LLN).


<img src='https://theaiengineer.dev/tae_logo_gw_flat.png' alt='The Python Quants' width='35%' align='right'>
