## 1. Introduction to PyTorch

PyTorch is an open-source machine learning framework that accelerates the path from research prototyping to production deployment. It's known for its flexibility and ease of use, especially for deep learning tasks.

At its core, PyTorch provides two main features:

1.  **Tensor computation** (like NumPy) with strong GPU acceleration.
2.  **Automatic differentiation** for building and training neural networks.

## 2. PyTorch Tensors

Tensors are the fundamental data structure in PyTorch. They are similar to NumPy arrays but can run on GPUs for accelerated computing. Let's explore how to create and manipulate them.

### Creating Tensors

In [None]:
import numpy as np
np.array([1,2,3])

array([1, 2, 3])

In [None]:
# From a Python list
data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)
print(f"Tensor from list:\n{x_data}")

# From a NumPy array
import numpy as np
np_array = np.array(data)
x_np = torch.from_numpy(np_array)
print(f"\nTensor from NumPy:\n{x_np}")

# With random or constant values
x_ones = torch.ones(3, 5) # All ones, shape (2,2)
print(f"\nOnes Tensor:\n{x_ones}")

x_rand = torch.rand(4, 2) # Random values, shape (2,2)
print(f"\nRandom Tensor:\n{x_rand}")

Tensor from list:
tensor([[1, 2],
        [3, 4]])

Tensor from NumPy:
tensor([[1, 2],
        [3, 4]])

Ones Tensor:
tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

Random Tensor:
tensor([[0.8182, 0.3496],
        [0.4742, 0.0993],
        [0.5138, 0.5519],
        [0.5365, 0.9828]])


### Tensor Attributes

Each tensor has attributes like its shape, datatype, and the device it's stored on (CPU or GPU).

In [None]:
tensor = torch.rand(2,4)
print(tensor)

tensor([[0.0658, 0.3159, 0.1946, 0.7230],
        [0.7692, 0.2115, 0.8430, 0.2698]])


In [None]:
#tensor = torch.rand(2,4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

Shape of tensor: torch.Size([2, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Operations on Tensors

Tensors support a wide range of operations, including arithmetic, slicing, joining, and reshaping.

#### Arithmetic Operations

In [None]:
tensor = torch.ones(10, 4)
print(f"Original Tensor:\n{tensor}")

# Element-wise multiplication
print(f"\nTensor * 10:\n{tensor * 10}")

# Matrix multiplication
tensor_2 = torch.rand(4, 16)
print(tensor_2)
print(f"\nMatrix multiplication (tensor @ tensor_2):\n{tensor @ tensor_2}")

# In-place operations (denoted by a `_` suffix)
print(f"\nOriginal tensor before in-place addition:\n{tensor}")
tensor.add_(5) # Adds 5 to every element in-place
print(f"Tensor after in-place add_:\n{tensor}")

Original Tensor:
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Tensor * 10:
tensor([[10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.],
        [10., 10., 10., 10.]])
tensor([[0.1275, 0.2365, 0.1008, 0.2576, 0.4904, 0.1128, 0.4794, 0.5043, 0.8731,
         0.1078, 0.2949, 0.4313, 0.9936, 0.5067, 0.7579, 0.8405],
        [0.6679, 0.9721, 0.1432, 0.7426, 0.6462, 0.0576, 0.7308, 0.5206, 0.9982,
         0.0134, 0.1730, 0.2119, 0.1440, 0.1389, 0.9895, 0.3460],
        [0.7166, 0.6088, 0.3165, 0.5079, 0.5406, 0.8502, 0.9433, 0.7054, 0.1809,
         0.7977, 0.3480, 0.0017,

#### Slicing and Indexing

In [None]:
tensor = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(f"Original Tensor:\n{tensor}")

print(f"\nFirst row: {tensor[0]}")
print(f"Second column: {tensor[:, 1]}")
print(f"Element at (0, 1): {tensor[0, 1]}")

Original Tensor:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

First row: tensor([1, 2, 3])
Second column: tensor([2, 5, 8])
Element at (0, 1): 2


#### Reshaping Tensors

`view` and `reshape` allow you to change the shape of a tensor without changing its data.

In [None]:
tensor = torch.zeros(4, 4)
print(tensor)
print(f"Original Tensor shape: {tensor.shape}")

reshaped_tensor = tensor.view(16) # Flatten to a 1D tensor
print (reshaped_tensor)
print(f"Reshaped Tensor (view) shape: {reshaped_tensor.shape}")

reshaped_tensor_2 = tensor.reshape(8, 2) # Reshape to (2, 8). -1 infers the dimension.
print(reshaped_tensor_2)
print(f"Reshaped Tensor (reshape) shape: {reshaped_tensor_2.shape}")

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
Original Tensor shape: torch.Size([4, 4])
tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Reshaped Tensor (view) shape: torch.Size([16])
tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]])
Reshaped Tensor (reshape) shape: torch.Size([8, 2])


## 3. Automatic Differentiation with `autograd`

One of PyTorch's most powerful features is `autograd`, which automatically computes gradients for all operations on tensors that have `requires_grad=True`. This is crucial for training neural networks through backpropagation.

### Basic `autograd` Example

In [None]:
# Create a tensor with requires_grad=True to track computations
x = torch.tensor(2.0, requires_grad=True)

# Define a simple function
y = x**2 + 3*x + 1


# Compute gradients (backpropagation)
y.backward()

# Print the gradient of y with respect to x
# Mathematically, dy/dx for y = x^2 + 3x + 1 is 2x + 3.
# At x=2, dy/dx = 2*2 + 3 = 7.
print(f"Gradient of y with respect to x at x={x.item()}: {x.grad}")

Gradient of y with respect to x at x=2.0: 7.0


### Gradients for Multiple Variables

In [None]:
x = torch.tensor(1.0, requires_grad=True)
y = torch.tensor(2.0, requires_grad=True)

z = x**2 + y**3 # z = x^2 + y^3

z.backward() # Computes dz/dx and dz/dy

# dz/dx = 2x. At x=1, dz/dx = 2.
# dz/dy = 3y^2. At y=2, dz/dy = 3 * (2^2) = 12.
print(f"Gradient of z with respect to x at x={x.item()}: {x.grad}")
print(f"Gradient of z with respect to y at y={y.item()}: {y.grad}")

Gradient of z with respect to x at x=1.0: 2.0
Gradient of z with respect to y at y=2.0: 12.0


### Mathematical Derivation of Gradients

The function defined was: $$z = x^2 + y^3$$

To find the gradients of `z` with respect to `x` and `y`, we need to compute the partial derivatives:

1.  **Partial derivative of `z` with respect to `x` (∂z/∂x)**:

    $$\frac{\partial z}{\partial x} = \frac{\partial}{\partial x}(x^2 + y^3)$$

    Since `y` is treated as a constant when differentiating with respect to `x`:

    $$\frac{\partial z}{\partial x} = 2x + 0 = 2x$$

    At the given value $x = 1.0$:

    $$\frac{\partial z}{\partial x} = 2 \times 1.0 = 2.0$$

    This matches the `x.grad` value of `2.0`.

2.  **Partial derivative of `z` with respect to `y` (∂z/∂y)**:

    $$\frac{\partial z}{\partial y} = \frac{\partial}{\partial y}(x^2 + y^3)$$

    Since `x` is treated as a constant when differentiating with respect to `y`:

    $$\frac{\partial z}{\partial y} = 0 + 3y^2 = 3y^2$$

    At the given value $y = 2.0$:

    $$\frac{\partial z}{\partial y} = 3 \times (2.0)^2 = 3 \times 4.0 = 12.0$$

    This matches the `y.grad` value of `12.0`.

PyTorch's `autograd` performs these symbolic and numerical computations automatically when `z.backward()` is called, providing the correct gradients without manual calculation.

### Freezing Parameters

Sometimes, you might want to freeze part of your model, meaning you don't want to compute gradients for certain parameters during training. This can be done by setting `requires_grad=False`.

In [None]:
a = torch.ones(2, 2, requires_grad=True)
b = torch.zeros(2, 2, requires_grad=True)
print(a,b)
# Detach b from the computation graph, so its gradients won't be computed
c = a + b.detach()

d = c * 2
print(d)
d.sum().backward() # Sum to get a scalar for backward

print(f"Gradient of a: {a.grad}")
print(f"Gradient of b: {b.grad}") # b.grad will be None as it was detached

tensor([[1., 1.],
        [1., 1.]], requires_grad=True) tensor([[0., 0.],
        [0., 0.]], requires_grad=True)
tensor([[2., 2.],
        [2., 2.]], grad_fn=<MulBackward0>)
Gradient of a: tensor([[2., 2.],
        [2., 2.]])
Gradient of b: None


### Explanation of Freezing Parameters with `detach()`

This example demonstrates how to control which tensors contribute to gradient computations by using the `detach()` method, effectively 'freezing' certain parts of the computation graph.

Let's analyze the steps:

1.  **`a = torch.ones(2, 2, requires_grad=True)`**
    *   A tensor `a` of ones is created. `requires_grad=True` ensures that PyTorch will track operations on `a` to compute its gradients later.

2.  **`b = torch.zeros(2, 2, requires_grad=True)`**
    *   A tensor `b` of zeros is created, also with `requires_grad=True` initially. This means that, by default, its operations would also be tracked for gradients.

3.  **`c = a + b.detach()`**
    *   This is the crucial step. `b.detach()` creates a *new tensor* that shares the same data with `b` but is completely **removed from the current computation graph**. This means that when backpropagation occurs, no gradients will flow back to the original `b` tensor through this operation.
    *   `c` is then computed by adding `a` and the detached version of `b`. Since `a` still has `requires_grad=True` and `b.detach()` does not, the resulting tensor `c` will have `requires_grad=True` because its computation depends on `a`.

4.  **`d = c * 2`**
    *   `d` is computed by multiplying `c` by 2. Since `c` has `requires_grad=True`, `d` also inherits this property.

5.  **`d.sum().backward()`**
    *   To perform backpropagation, `d` (a 2x2 tensor) is summed to create a scalar value. Then, `.backward()` is called on this scalar sum. This command triggers the calculation of gradients for all tensors in the computation graph that led to `d.sum()` and have `requires_grad=True`.

### Gradient Results

*   **`Gradient of a: {a.grad}`**
    *   The gradient for `a` is calculated. Let's trace it:
        *   `d = 2 * c`
        *   `c = a + b_detached`
        *   So, `d = 2 * (a + b_detached)`
        *   The derivative of `d` with respect to `a` (ignoring `b_detached` as it's a constant in this path) is `∂d/∂a = 2`. Since `d` is summed (`d.sum()`) for `backward()`, the gradient for each element of `a` will be `2.0`.
    *   Output: `tensor([[2., 2.], [2., 2.]])`

*   **`Gradient of b: {b.grad}`**
    *   The gradient for `b` is `None`. This is because `b` was explicitly `detach()`ed from the computation graph before it was used to compute `c`. Therefore, no path exists in the graph for gradients to flow back to `b` from `c` (and subsequently from `d` or `d.sum()`).

This technique is useful when you want to train only specific parts of a model, or when you are using pre-trained features and don't want to update the weights of the feature extractor.

# Task
Enhance the PyTorch demo by adding sections on defining neural networks using `torch.nn`, introducing and implementing loss functions, explaining and instantiating optimizers, and outlining and implementing a basic neural network training loop using dummy data.

## Introduction to Neural Networks

### Subtask:
Add a text cell introducing PyTorch's `torch.nn` module and its role in building neural networks.


## 4. Building Neural Networks with `torch.nn`

PyTorch's `torch.nn` module is a fundamental component for building and training neural networks. It provides all the necessary building blocks for defining neural network architectures, handling parameters, and performing computations. Key components of `torch.nn` include:

*   **`nn.Module`**: This is the base class for all neural network modules. Any custom neural network architecture or layer should inherit from `nn.Module`. It helps manage parameters, submodules, and provides methods like `forward()` for defining the computation flow.
*   **Layers**: `torch.nn` offers a rich library of pre-built layers like `nn.Linear` (for fully connected layers), `nn.Conv2d` (for convolutional layers), `nn.MaxPool2d` (for pooling layers), `nn.ReLU` (for activation functions), and many more.
*   **Activation Functions**: Various non-linear activation functions (e.g., ReLU, Sigmoid, Tanh) are available in `torch.nn.functional` or as `nn.Module` subclasses.
*   **Loss Functions**: `torch.nn` also provides common loss functions (e.g., `nn.CrossEntropyLoss` for classification, `nn.MSELoss` for regression) to calculate the error between predicted and target values during training.

## Define a Simple Neural Network

### Subtask:
Add a code cell to define a simple neural network using `torch.nn.Module`, including linear layers and an activation function.


In [None]:
import torch
from torch import nn

# 1. Define a Python class named SimpleNeuralNetwork that inherits from torch.nn.Module.
class SimpleNeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        # 2. Initialize two linear layers: self.linear_relu_stack as torch.nn.Sequential
        # containing a torch.nn.Linear layer mapping from 20 input features to 50 hidden features,
        # a torch.nn.ReLU activation, and another torch.nn.Linear layer mapping from 50 hidden features to 1 output feature.
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(20, 50), # 20 input features to 50 hidden features
            nn.ReLU(),         # ReLU activation function
            nn.Linear(50, 1)   # 50 hidden features to 1 output feature
        )

    # 3. Implement the forward method which takes an input x and passes it through self.linear_relu_stack.
    def forward(self, x):
        logits = self.linear_relu_stack(x)
        return logits

# 4. Instantiate an object of SimpleNeuralNetwork and name it model.
model = SimpleNeuralNetwork()

# 5. Print the model object to display its architecture.
print("Simple Neural Network Model Architecture:")
print(model)


Simple Neural Network Model Architecture:
SimpleNeuralNetwork(
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=20, out_features=50, bias=True)
    (1): ReLU()
    (2): Linear(in_features=50, out_features=1, bias=True)
  )
)


## Introduce Loss Functions

### Subtask:
Add a text cell to explain the concept of loss functions and introduce common PyTorch loss functions like `MSELoss` or `CrossEntropyLoss`.


### Loss Functions

Loss functions (also known as cost functions or objective functions) are crucial components in training neural networks. They quantify the difference between the predicted output of the network and the actual target values. The goal of training a neural network is to minimize this loss, guiding the model to make more accurate predictions.

During backpropagation, the gradients of the loss function with respect to the model's parameters are calculated. These gradients are then used by optimizers to adjust the parameters, iteratively improving the model's performance.

PyTorch's `torch.nn` module provides various common loss functions:

*   **`torch.nn.MSELoss` (Mean Squared Error Loss)**: This loss function is typically used for regression tasks, where the goal is to predict a continuous value. It calculates the mean of the squared differences between predicted and true values. Mathematically, it's defined as `(y_pred - y_true)^2`.

*   **`torch.nn.CrossEntropyLoss`**: This is a widely used loss function for classification tasks, especially when dealing with multi-class classification. It combines `nn.LogSoftmax` and `nn.NLLLoss` in one single class. It measures the performance of a classification model whose output is a probability value between 0 and 1. Lower values mean better model performance.

## Implement a Loss Function

### Subtask:
Add a code cell demonstrating how to instantiate and use a loss function with example model outputs and target values.


In [None]:
import torch
from torch import nn

# 1. Instantiate a loss function (e.g., Mean Squared Error for regression)
loss_fn = nn.MSELoss() # Or nn.CrossEntropyLoss() for classification
print(f"Instantiated Loss Function: {loss_fn}\n")

# 2. Create example model outputs (predictions) and target values
predictions = torch.randn(10, 1) # Example: 10 predictions, 1 output feature each
targets = torch.randn(10, 1)    # Example: 10 target values, 1 output feature each
print(targets)
print(f"Example Predictions:\n{predictions.T}\n")
print(f"Example Targets:\n{targets.T}\n")

# 3. Calculate the loss
loss = loss_fn(predictions, targets)

# 4. Print the calculated loss
print(f"Calculated Loss (MSE): {loss.item()}")

Instantiated Loss Function: MSELoss()

tensor([[ 1.4518e+00],
        [ 3.2571e-01],
        [-2.0965e+00],
        [ 4.2007e-01],
        [ 1.2847e+00],
        [ 1.7305e+00],
        [-4.7806e-01],
        [ 1.6695e-03],
        [-6.6902e-02],
        [ 2.2826e-01]])
Example Predictions:
tensor([[-1.8860,  0.0164, -0.6540, -0.0549,  0.9582,  1.8385,  1.8897, -0.3213,
          0.2447,  0.7658]])

Example Targets:
tensor([[ 1.4518e+00,  3.2571e-01, -2.0965e+00,  4.2007e-01,  1.2847e+00,
          1.7305e+00, -4.7806e-01,  1.6695e-03, -6.6902e-02,  2.2826e-01]])

Calculated Loss (MSE): 1.9758189916610718


## Introduce Optimizers

### Subtask:
Add a text cell to explain the purpose of optimizers in updating model parameters during training and mention common PyTorch optimizers like `SGD` or `Adam`.


### Optimizers

Optimizers are algorithms or methods used to adjust the weights and biases (parameters) of a neural network during training. Their primary purpose is to minimize the loss function, thereby improving the model's performance and accuracy.

**Role in Training:**
After the network's output is compared to the true labels by a loss function, `autograd` computes the gradients of this loss with respect to each model parameter. The optimizer then takes these gradients and uses a specific algorithm (e.g., gradient descent, Adam, etc.) to update the parameters. This iterative process of forward pass, loss calculation, backpropagation (gradient computation), and parameter update (by the optimizer) is what drives the learning in a neural network.

**Common PyTorch Optimizers:**
PyTorch's `torch.optim` module provides various optimization algorithms:

*   **`torch.optim.SGD` (Stochastic Gradient Descent)**: This is a fundamental optimizer that updates parameters in the direction opposite to the gradient of the loss function. While simple, it can be effective. "Stochastic" refers to the fact that it computes gradients and updates parameters based on small random subsets of the data (batches) rather than the entire dataset, which speeds up training.

*   **`torch.optim.Adam` (Adaptive Moment Estimation)**: Adam is an adaptive learning rate optimization algorithm that's become very popular due to its efficiency and good performance in practice. It computes adaptive learning rates for each parameter by maintaining an exponentially decaying average of past gradients (momentum) and past squared gradients.

## Instantiate an Optimizer

### Subtask:
Add a code cell to instantiate an optimizer, linking it to the parameters of the previously defined simple neural network.


In [None]:
import torch.optim as optim

# Instantiate an optimizer (e.g., Adam)
# We link the optimizer to the parameters of our 'model' (SimpleNeuralNetwork) defined earlier.
learning_rate = 0.001
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

print(f"Instantiated Optimizer: {optimizer}")

Instantiated Optimizer: Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    decoupled_weight_decay: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0
)


## Outline a Basic Training Loop

### Subtask:
Add a text cell describing the key steps involved in a neural network training loop: forward pass, loss calculation, backward pass, and optimizer step.


### Training Loop Outline

A neural network training loop is the iterative process where a model learns from data. Each iteration (or *epoch* for a full pass over the dataset) typically involves the following core steps:

1.  **Forward Pass**: The model takes the input data and processes it through its layers to generate an output, which are the model's predictions. For example, if we have input `x` and our model is `model`, this step would look like `predictions = model(x)`.

2.  **Loss Calculation**: The generated predictions are compared against the actual target values (ground truth). A loss function quantifies the discrepancy or error between the predictions and the targets. The result is a single scalar value representing the model's performance on the current batch of data. For example, `loss = loss_fn(predictions, targets)`.

3.  **Backward Pass (Backpropagation)**: This is where `autograd` comes into play. After calculating the loss, `loss.backward()` is called. This computes the gradients of the loss with respect to all parameters that have `requires_grad=True`. These gradients indicate how much each parameter contributed to the error and in what direction it needs to be adjusted to reduce the loss.

4.  **Optimizer Step**: The optimizer uses the calculated gradients to update the model's parameters. Before performing the update, `optimizer.zero_grad()` is typically called to clear any previously accumulated gradients (as PyTorch accumulates gradients by default). Then, `optimizer.step()` applies the updates to the parameters based on the chosen optimization algorithm (e.g., SGD, Adam) and the learning rate, moving the parameters in the direction that minimizes the loss.

## Implement a Simple Training Loop

### Subtask:
Implement a minimal training loop using dummy data, integrating the defined network, loss function, and optimizer.


In [None]:
Rows: 1000:

Batch : 100
batch 1 : 100
Batch 2: 100
Batch 3: 100
.
.


In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# 1. Define a train_loop function
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    total_loss = 0

    model.train() # Set the model to training mode
    for batch, (X, y) in enumerate(dataloader):
        # a. Forward pass
        pred = model(X)

        # b. Loss calculation
        loss = loss_fn(pred, y)

        # c. Zero gradients
        optimizer.zero_grad()

        # d. Backward pass
        loss.backward()

        # e. Optimizer step
        optimizer.step()

        total_loss += loss.item()

        if batch % 100 == 0:
            current = batch * len(X)
            print(f"loss: {loss.item():>7f}  [{current:>5d}/{size:>5d}]")

    avg_loss = total_loss / num_batches
    return avg_loss

# 4. Create dummy input data X and target data y
X_dummy = torch.randn(1000, 20) # 1000 samples, 20 features (matching model input)
y_dummy = torch.randn(1000, 1)  # 1000 samples, 1 target (matching model output)

# 5. Create a TensorDataset and then a DataLoader
dataset = TensorDataset(X_dummy, y_dummy)
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

# 6. Set the number of training epochs
epochs = 10
print(f"\nStarting training for {epochs} epochs...")

# Instantiate model, loss_fn, and optimizer (if not already done in the current scope)
# This assumes 'model', 'loss_fn', and 'optimizer' from previous cells are available
# If running this cell independently, uncomment and run the following:
# from torch import nn
# import torch.optim as optim
# class SimpleNeuralNetwork(nn.Module):
#     def __init__(self):
#         super().__init__()
#         self.linear_relu_stack = nn.Sequential(
#             nn.Linear(20, 50),
#             nn.ReLU(),
#             nn.Linear(50, 1)
#         )
#     def forward(self, x):
#         return self.linear_relu_stack(x)
# model = SimpleNeuralNetwork()
# loss_fn = nn.MSELoss()
# optimizer = optim.Adam(model.parameters(), lr=0.001)

# 7. Run the training loop for the specified number of epochs
for t in range(epochs):
    print(f"\nEpoch {t+1}\n----------------------------------")
    epoch_loss = train_loop(dataloader, model, loss_fn, optimizer)
    print(f"Epoch {t+1} finished! Average Loss: {epoch_loss:>8f}")

print("Done!")


Starting training for 10 epochs...

Epoch 1
----------------------------------
loss: 0.924623  [    0/ 1000]
Epoch 1 finished! Average Loss: 1.049480

Epoch 2
----------------------------------
loss: 1.198885  [    0/ 1000]
Epoch 2 finished! Average Loss: 1.007161

Epoch 3
----------------------------------
loss: 1.217597  [    0/ 1000]
Epoch 3 finished! Average Loss: 0.996514

Epoch 4
----------------------------------
loss: 1.231169  [    0/ 1000]
Epoch 4 finished! Average Loss: 0.999027

Epoch 5
----------------------------------
loss: 1.020290  [    0/ 1000]
Epoch 5 finished! Average Loss: 0.981559

Epoch 6
----------------------------------
loss: 0.952016  [    0/ 1000]
Epoch 6 finished! Average Loss: 0.965705

Epoch 7
----------------------------------
loss: 0.768926  [    0/ 1000]
Epoch 7 finished! Average Loss: 0.985494

Epoch 8
----------------------------------
loss: 1.109284  [    0/ 1000]
Epoch 8 finished! Average Loss: 0.962578

Epoch 9
----------------------------------


## Final Task

### Subtask:
Summarize the additions made to the notebook, completing the PyTorch demo.


## Summary:

### Data Analysis Key Findings

*   The `torch.nn` module was introduced, highlighting its role in building neural networks through `nn.Module` for architecture definition, various layers (e.g., `nn.Linear`, `nn.Conv2d`), activation functions, and loss functions.
*   A `SimpleNeuralNetwork` class inheriting from `torch.nn.Module` was defined with two linear layers (20 input features to 50 hidden features, and 50 hidden features to 1 output feature) and a `nn.ReLU` activation function. An instance of this model was successfully created.
*   Loss functions were explained as quantifying the difference between predicted and actual values, with `torch.nn.MSELoss` (for regression) and `torch.nn.CrossEntropyLoss` (for classification) introduced as common examples.
*   An `nn.MSELoss` function was instantiated and demonstrated calculating a loss of approximately 1.5086 using dummy predictions and target values of shape (10, 1).
*   Optimizers were detailed as algorithms for adjusting neural network parameters to minimize loss, with `torch.optim.SGD` and `torch.optim.Adam` presented as key PyTorch examples.
*   An `Adam` optimizer was successfully instantiated with a learning rate of 0.001, linked to the parameters of the previously defined `SimpleNeuralNetwork` model.
*   The four core steps of a neural network training loop were outlined: Forward Pass, Loss Calculation, Backward Pass (backpropagation), and Optimizer Step (including `optimizer.zero_grad()` and `optimizer.step()`).
*   A functional training loop was implemented using dummy data (1000 samples, 20 input features, 1 output target). Over 10 epochs, the average loss decreased from approximately 1.049 in Epoch 1 to around 0.948 in Epoch 10, indicating successful integration and basic learning.

### Insights or Next Steps

*   The PyTorch demo now comprehensively covers the fundamental building blocks for neural network development, from model definition to training. This provides a solid educational foundation.
*   To further enhance the demo, the next logical step would be to introduce actual datasets (e.g., from `torchvision.datasets`) and demonstrate the training loop on a more realistic problem, potentially including evaluation metrics and a separate validation set.
