# Fundamentals of Deep LEarning - PArt 1

## **1. Why Deep Learning?**

Deep Learning (DL) is a subfield of Machine Learning (ML) that uses **multi-layered artificial neural networks** to model complex patterns in data.

### **Key Reasons for Popularity**

* **Handles raw, unstructured data**: Images, audio, text — without heavy manual feature engineering.
* **Automatic feature extraction**: Learns hierarchical features directly from data.
* **Scales with data and compute**: Performance improves significantly with more data and GPU/TPU power.
* **Breakthrough results**: Outperformed traditional ML in computer vision, NLP, speech recognition, recommendation systems, etc.
* **End-to-end learning**: Goes from raw input → prediction without intermediate handcrafted features.

📌 **Real-world examples:**

* Face recognition in smartphones
* Google Translate’s neural machine translation
* ChatGPT and generative AI models
* Autonomous vehicle perception systems

---

## **2. DL vs ML: Technical Differences and Use Cases**

| **Aspect**              | **Machine Learning (ML)**                                | **Deep Learning (DL)**                                           |
| ----------------------- | -------------------------------------------------------- | ---------------------------------------------------------------- |
| **Data requirement**    | Works well with small/medium datasets                    | Requires large datasets for good performance                     |
| **Feature engineering** | Manual feature extraction is crucial                     | Learns features automatically                                    |
| **Model complexity**    | Simpler models (e.g., linear regression, decision trees) | Multi-layer neural networks with millions/billions of parameters |
| **Computation**         | Can run on CPUs easily                                   | Often requires GPUs/TPUs                                         |
| **Interpretability**    | Easier to interpret                                      | More of a “black box”                                            |
| **Performance**         | Can saturate on complex tasks                            | Can scale performance with data and depth                        |

**Use Cases**

* **ML**: Credit scoring, churn prediction, time series forecasting (small data), recommendation with tabular data
* **DL**: Image classification, NLP (chatbots, translation), audio transcription, large-scale recommender systems

---

## **3. Steps Involved in a Deep Learning Workflow**

1. **Define the Problem**

   * Is it classification, regression, segmentation, generation, etc.?

2. **Gather & Prepare Data**

   * Collect datasets (images, text, audio, etc.).
   * Split into training, validation, test sets.
   * Preprocess (normalization, tokenization, data augmentation).

3. **Choose a Model Architecture**

   * CNNs for images, RNN/LSTM/Transformers for sequences, GANs for generative tasks.

4. **Define the Loss Function**

   * E.g., Cross-Entropy Loss for classification, MSE for regression.

5. **Select the Optimizer**

   * SGD, Adam, RMSprop — update weights to minimize loss.

6. **Train the Model**

   * Forward pass → compute loss → backward pass (backpropagation) → update weights.

7. **Validate & Tune Hyperparameters**

   * Learning rate, batch size, number of layers, dropout rate.

8. **Test the Model**

   * Measure generalization on unseen data.

9. **Deploy**

   * Package the model into an application, API, or edge device.

---

## **4. Popular Frameworks for Deep Learning**

| Framework      | Language      | Key Features                                                 | Popular Use                          |
| -------------- | ------------- | ------------------------------------------------------------ | ------------------------------------ |
| **TensorFlow** | Python, C++   | Large ecosystem, production-ready, integrates with Keras     | Google-scale deployments             |
| **PyTorch**    | Python        | Dynamic computation graphs, easy to debug, research-friendly | Academic research, production (Meta) |
| **Keras**      | Python        | High-level API (can run on TensorFlow, Theano, CNTK)         | Fast prototyping                     |
| **JAX**        | Python        | Autograd + XLA compilation for speed                         | High-performance research            |
| **MXNet**      | Python, Scala | Efficient distributed training                               | AWS SageMaker backend                |

📌 **Current trend**: PyTorch dominates research; TensorFlow/Keras still strong in production.

---

## **5. What are Neurons in Deep Learning?**

A **neuron** is the basic computational unit in a neural network — inspired by biological neurons.

### **Structure**

* **Inputs** ($x_1, x_2, ..., x_n$)
* **Weights** ($w_1, w_2, ..., w_n$) → determines importance of each input
* **Bias** ($b$) → shifts activation threshold
* **Summation** → $z = w_1x_1 + w_2x_2 + ... + w_nx_n + b$
* **Activation function** → non-linear transformation (ReLU, sigmoid, tanh, etc.)

📌 **Mathematical representation**:

$$
y = \phi \left( \sum_{i=1}^{n} w_i x_i + b \right)
$$

where $\phi$ is the activation function.

---

## **6. What Do Neurons Learn and How Do They Learn?**

### **What they learn**

* In early layers → **low-level patterns** (edges, curves in images; word associations in text).
* In deeper layers → **high-level concepts** (faces, objects, sentence meaning).

### **How they learn**

* **Forward pass**: Input flows through the network → prediction is made.
* **Loss computation**: Compare prediction to actual label using loss function.
* **Backward pass (Backpropagation)**:

  * Compute gradients of loss w\.r.t. weights (∂Loss/∂w).
  * Update weights using **gradient descent**:

    $$
    w_{\text{new}} = w_{\text{old}} - \eta \cdot \frac{\partial L}{\partial w}
    $$

    where $\eta$ is the learning rate.
* Repeat for many **epochs** until convergence.

---

### **Mini Python Example**: A Tiny Neural Network in PyTorch

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim

# Dummy dataset
X = torch.tensor([[0.0], [1.0], [2.0], [3.0]])
y = torch.tensor([[0.0], [2.0], [4.0], [6.0]])  # y = 2x

# Simple network
model = nn.Sequential(
    nn.Linear(1, 5),  # input layer -> hidden layer
    nn.ReLU(),
    nn.Linear(5, 1)   # hidden -> output layer
)

loss_fn = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
for epoch in range(500):
    y_pred = model(X)             # Forward pass
    loss = loss_fn(y_pred, y)     # Compute loss
    optimizer.zero_grad()         # Reset gradients
    loss.backward()               # Backpropagation
    optimizer.step()              # Update weights

print("Predictions:", model(X).detach().numpy())


Predictions: [[2.682209e-07]
 [2.000000e+00]
 [4.000000e+00]
 [6.000000e+00]]
