# Machine Learning â€” Overview

## Purpose
- Learn patterns from data to make predictions or discover structure.
- Balance generalization with model complexity.
- Build reproducible pipelines for experimentation.

## Key questions this section answers
- Is the task supervised, unsupervised, or semi-supervised?
- Which metrics reflect the real-world objective?
- How do we choose models and tune hyperparameters?

## Topics
- Supervised learning (classification, regression)
- Unsupervised learning (clustering, dimensionality reduction)
- Model selection, cross-validation, and tuning
- Bias-variance tradeoff and regularization
- Feature engineering and preprocessing pipelines

## References
- scikit-learn, XGBoost, LightGBM; "The Elements of Statistical Learning"


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

rng = np.random.default_rng(3)
complexity = np.arange(1, 11)
train_error = 0.9 / complexity + 0.02 * rng.normal(size=complexity.size)
val_error = 0.2 + 0.03 * (complexity - 5) ** 2 + 0.02 * rng.normal(size=complexity.size)

frame = pd.DataFrame(
    {
        "complexity": complexity,
        "train_error": train_error,
        "validation_error": val_error,
    }
)

fig = px.line(
    frame,
    x="complexity",
    y=["train_error", "validation_error"],
    markers=True,
    title="Bias-variance tradeoff (conceptual)",
)
fig.update_layout(xaxis_title="model complexity", yaxis_title="error")
fig

## Takeaway
Always compare against a simple baseline and validate on a held-out time or entity split.

