# Deep Learning â€” Overview

## Purpose
- Learn hierarchical representations with neural networks.
- Scale modeling capacity with data and compute.
- Reduce manual feature engineering for complex signals.

## Key questions this section answers
- Which architecture fits the data (CNN, RNN, Transformer)?
- How do we optimize and regularize large models?
- How do we monitor training and avoid overfitting?

## Topics
- MLPs, CNNs, RNNs/LSTMs/GRUs, Transformers
- Embeddings and representation learning
- Optimization (SGD, Adam) and learning-rate schedules
- Regularization (dropout, weight decay, augmentation)
- Loss functions, metrics, and debugging
- Deployment and inference efficiency

## References
- PyTorch, TensorFlow, Keras; Goodfellow et al., "Deep Learning"


In [None]:
import numpy as np
import plotly.graph_objects as go

x = np.linspace(-5, 5, 400)
sigmoid = 1 / (1 + np.exp(-x))
tanh = np.tanh(x)
relu = np.maximum(0, x)
gelu = 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))

fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=sigmoid, name="sigmoid"))
fig.add_trace(go.Scatter(x=x, y=tanh, name="tanh"))
fig.add_trace(go.Scatter(x=x, y=relu, name="ReLU"))
fig.add_trace(go.Scatter(x=x, y=gelu, name="GELU"))
fig.update_layout(
    title="Common activation functions",
    xaxis_title="x",
    yaxis_title="activation(x)",
)
fig

## Takeaway
Depth adds expressive power, but training stability and data scale become critical.

