# **Deep Learning Primer**
## Outline
- Fundamental concepts of deep learning.
- Mathematical model of a neuron.
- Overview of neural network structures and common architectures.
- Optimizers, gradient descent, and backpropagation algorithms.
- Introduction to TensorFlow and PyTorch for deep learning applications.
- **Hands-on Lab:** Image classification using TensorFlow or PyTorch.


<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **The Evolution of Artificial Intelligence**

#### The Birth of Artificial Intelligence

- Artificial Intelligence (AI) emerged as a field in the mid-20th century.
- Symbolic reasoning was the dominant approach initially.

#### Early Successes: **Expert Systems**

- Symbolic reasoning led to early successes, including the development of expert systems.
- Expert systems were computer programs capable of **mimicking human expertise** in specific problem domains.

#### The Challenge of Scaling Knowledge

- Symbolic reasoning faced limitations in scaling knowledge.
- Extracting, representing, and maintaining knowledge in computer-based systems proved complex and costly.

#### The AI Winter of the 1970s

- These challenges culminated in what became known as the **"AI Winter"** during the 1970s.
- The AI Winter represented a period of reduced optimism and funding for AI research.
- The **scalability** and **practicality** of symbolic reasoning-based approaches came into question.

<img src="./images/history-of-ai.png"  align="center"/>


## **Evolution of Artificial Intelligence Approaches**

- Over time, AI approaches have evolved due to **cheaper computing** resources and increased **data availability**.
- Neural network approaches have gained prominence
  - Often outperforming humans in areas like **computer vision** and **speech understanding**

#### The Transformation of Chess Programs

- **Early Chess Programs:**
  - Early chess programs relied on **search algorithms** to evaluate possible moves.
  - The **alpha-beta pruning** search algorithm was a significant development
  - These programs performed well in the endgame but **struggled** at the beginning due to vast search spaces.

- **Case-Based Reasoning:**
  - To improve early-game performance, case-based reasoning was introduced.
  - Programs looked for cases in the knowledge base similar to the current game position.

- **Modern Chess Programs:**
  - Today's chess programs excel due to **neural networks** and **reinforcement learning**.
  - They learn by playing against themselves, adapting and improving rapidly.

#### Evolution of "Talking Programs"

- **Early "Talking Programs" (e.g., Eliza):**
  - Early conversational programs used simple grammatical rules and sentence re-formulation.
  
- **Modern Virtual Assistants (Cortana, Siri, Google Assistant):**
  - Modern virtual assistants employ hybrid systems.
  - They use neural networks for **speech-to-text** conversion and **intent recognition**.
  - Reasoning and explicit algorithms are applied to execute actions.

- **The Future of AI Dialogue Systems:**
  - Future developments may lead to entirely neural-based models handling dialogue.
  - Models like GPT and Turing-NLG demonstrate significant progress in natural language understanding and generation.


## **The Rise of Neural Networks**

#### Emergence of Large Public Datasets

- The significant growth in neural network research began around **2010**.
- The availability of large public datasets played a crucial role in this development
- **ImageNet**, a collection of around **14 million annotated images**, led to the ImageNet Large Scale Visual Recognition Challenge.

#### Convolutional Neural Networks (CNNs) Revolutionize Image Classification

- In 2012, **Convolutional Neural Networks (CNNs)** were first applied to image classification.
- This breakthrough led to a substantial reduction in classification errors, from nearly **30%** to **16.4%**

#### Achieving Human-Level Accuracy

- In 2015, the **ResNet** architecture from Microsoft Research achieved human-level accuracy in image classification.
- This marked a significant milestone in neural network research.

#### Neural Network Success Stories

- Over the years, neural networks have demonstrated remarkable success in various tasks:

| Year | Task Achieved Human Parity |
|------|---------------------------|
| 2015 | Image Classification       |
| 2016 | Conversational Speech Recognition |
| 2018 | Automatic Machine Translation (Chinese-to-English) |
| 2020 | Image Captioning           |

#### The Era of Large Language Models

- Recent years have witnessed tremendous success with large language models like **BERT** and **GPT-3**.
- The availability of **vast amounts of general text data** has enabled training models to understand text structure and meaning.
- These models are **pre-trained** on extensive text collections and then fine-tuned for specific tasks.
- Natural Language Processing (NLP) has benefited immensely from these advancements.

<img src="./images/ilsvrc.gif" width=800 align="center"/>


## Discussion

- Identify where AI is most effectively utilized.
- AI applications are prevalent across various domains, enhancing user experiences and enabling new functionalities.
  - Mapping Applications
  - Speech-to-Text Services
  - Video Games

<img src="./images/border.jpg" height="10" width="1500" align="center"/>


## **Mathematical Models of Intelligence: Neural Networks**

<img src="https://raw.githubusercontent.com/wsko/hands-on-gen-ai-2/main/images/neural.jpeg" width="500" align="center"/>


- Since the mid-20th century, researchers have experimented with mathematical models for intelligence.
  - In recent years, one approach has seen remarkable success: **neural networks**.

- Neural networks, often referred to as **Artificial Neural Networks (ANNs)**
  - Mathematical models inspired by the structure and function of the human brain
- ANNs serve as models, not actual networks of biological neurons.

<img src="./images/border.jpg?raw=1" height="10" width="1500" align="center"/>



# Biological Basis of Neural Cells

<img src="./images/neuron1.png" width="600" align="center"/>

- In biology, the brain is composed of neural cells
- Each neural cell has multiple "inputs" (axons) and an output (dendrite).
- Axons and dendrites are capable of conducting electrical signals.
- The connections between axons and dendrites can vary in conductivity, regulated by neuromediators.

<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Mathematical Abstraction of a Neuron**

- The simplest mathematical model of a neuron includes multiple inputs X_1, ..., X_N and an output Y, along with a series of weights W_1, ..., W_N.
- The output is calculated using the formula:

    <img src="./images/netout.png" align="center"/>


    - Here, f represents a non-linear activation function.

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

#### Historical Perspective

- Early models of neurons were introduced in the classic paper titled "A Logical Calculus of the Ideas Immanent in Nervous Activity" by Warren McCullock and Walter Pitts in 1943.
- Donald Hebb further contributed to the field with his book "The Organization of Behavior: A Neuropsychological Theory," where he proposed methods for training such neural networks.


<img src="./images/neuron.png" align="center"/>

<img src="./images/rumerlhart_hinton-01.png" width=800 height=auto align="center"/>

- In 1986, Rumelhard, Hinton, and Williams presented a technique known as backpropagation in a Nature Letter.
  - Backpropagation was shown to be useful for training multi-layer neural networks.
- Although the idea of backpropagation was not exclusive to or created by Rumelhard, Hinton, and Williams, their publication in Nature Letter prompted a new era of research into neural networks.



Neural Networks:
  - Propagate signals forward from the input to the output layers

<img src="https://raw.githubusercontent.com/wsko/hands-on-gen-ai-2/main/images/forward.png" align="center" width = "600"/>

  - Propagate the error backwards from the output back into the network
    - Backpropagation

<img src="https://raw.githubusercontent.com/wsko/hands-on-gen-ai-2/main/images/backpropage.png" align="center" width = "600"/>

- Although Hinton made significant contributions to the development of modern deep learning in the 1980s, it would take some time for the advancements we see today to materialize.
- We can begin experimenting with some deep learning concepts now. However, to conduct proper experimentation, we require data.

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

## **Machine Learning vs Deep Learning (Deep Neural Network)**

- Terminology:
  - "Traditional" (not deep) machine learning uses structured data and uses a distinct algorithm such as logistic regression or random forest or SVM etc.
  - Deep learning is for unstrctured data. The algorithm is a network of simple learners (neurons which are similar to logistic regression) arranged into multiple deep layers
- Traditional machine learning approaches require feature representations that are designed by humans
- In deep learning, the focus is on **optimizing the weights of the model** to make the most accurate prediction without requiring explicit feature engineering


<img src="./images/dl1.png" width=500 height=auto align="center" />


- Deep learning learns data representation first.
  - One of the key strengths of deep learning is its ability to effectively **learn complex patterns in data**

<img src="./images/dl2.png" width=800 height=auto style="background-color:white;" align="center"/>

- Deep learning applies a multi-layer process for learning rich hierarchical  features (i.e., data representations)
  - Input image pixels → Edges → Textures → Parts → Objects

<img src="./images/dl3.png" width=800 height=auto style="background-color:white;" align="center"/>


Slide credit: Param Vir Singh – Deep Learning

## **Why DL is useful?**

- Deep learning offers a versatile and adaptable framework for representing various types of data, including visual, text, and linguistic information.
- It can learn in both **supervised** (?) and **unsupervised** (?) manners, and is recognized as an effective end-to-end learning system
  - However, deep learning requires a significant amount of training data to achieve optimal results.
- Since around 2010, deep learning has consistently outperformed other machine learning techniques, initially in areas such as vision and speech, and later in natural language processing and other applications.


- NNs use nonlinear mapping of the inputs x to the outputs f(x) to compute complex decision boundaries
- But then, why use deeper NNs?
  - The fact that deep NNs work better is an empirical observation
  - Mathematically, deep NNs have the same representational power as a one-layer NN

[<img src="./images/dl4.png" width=400 height=auto align="center"/>]


## **Neural Network Example**

- The task is to recognize digits that are handwritten, using the MNIST dataset.
- Each pixel's intensity is taken as an input feature, and the goal is to determine the digit class as the output.

<img src="./images/dl5.png" width=800 height=auto style="background-color:white;" align="center"/>


<img src="./images/dl6.png" width=800 height=auto style="background-color:white;" align="center"/>


## **Neural Networks**

- Neural networks are composed of hidden layers that contain neurons, which are computational units.
- A single neuron is responsible for mapping a set of inputs to a numerical output
  - Denoted as 𝑓:𝑅^𝐾→𝑅, where 𝑅^𝐾 represents a K-dimensional input space and 𝑅 represents the output space.

<img src="./images/dl7.png" width=800 height=auto style="background-color:white;" align="center"/>

- A NN with one hidden layer and one output layer

<img src="./images/dl8.png" width=800 height=auto style="background-color:white;" align="center"/>

# Tensorflow Playground

TensorFlow Playground is an interactive web-based tool that allows you to explore and experiment with neural networks. It provides a visual interface where you can adjust various parameters and observe how they affect the network's behavior.

### Key Features

1. **Architecture Design:** You can design and customize the architecture of your neural network by adding or removing layers, adjusting the number of neurons, and selecting different activation functions.

2. **Data Generation:** TensorFlow Playground provides various predefined datasets and patterns that you can use to train your network. You can also create custom datasets by drawing points directly on the graph.

3. **Training and Testing:** The tool enables you to train your network using different optimization algorithms and loss functions. You can adjust the learning rate, batch size, and regularization parameters. Additionally, you can split the dataset into training and testing sets to evaluate the network's performance.

4. **Visualizations:** TensorFlow Playground offers real-time visualizations of the network's training progress, such as loss curves, decision boundaries, and neuron activations. These visualizations help you understand how the network learns and how it makes predictions.

<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

### How to Use TensorFlow Playground

1. Open the TensorFlow Playground website in your web browser.

2. Select a dataset from the left sidebar or create your own by drawing points on the graph.

3. Customize the architecture of the neural network by adjusting the parameters in the right sidebar.

4. Configure the training settings, such as the optimization algorithm, learning rate, and batch size.

5. Start the training process by clicking the "Play" button.

6. Observe the visualizations and monitor the training progress on the right side of the screen.

7. Experiment with different settings and architectures to see how they affect the network's performance.

8. Once you're satisfied with the results, you can export the trained model for further use in TensorFlow or other frameworks.



[link](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.90335&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)

<img src="./images/dl10.png" width=800 height=auto align="center"/>



<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Deep Neural Network**

- Deep neural networks (DNNs) are characterized by having a large number of hidden layers.
- These hidden layers typically consist of fully-connected (also known as dense) layers, which are sometimes referred to as Multi-Layer Perceptrons (MLPs).
- In such layers, each neuron is connected to every neuron in the next layer.

<img src="./images/dl11.png" width=800 height=auto style="background-color:white;" align="center"/>

<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Activation Functions**

- Takes into account some kind of threshold is called an activation function
- Mathematically, there are many such activation functions that could achieve this effect

<img src="./images/activation.png" width=400 align="center"/>

<!--
- Non-linear activation functions are essential for neural networks to learn complex, non-linear data representations.
- Without these activation functions, neural networks would simply be a linear function such as 𝑊_1 𝑊_2 𝑥 = 𝑊𝑥.
  - However, by incorporating non-linear activation functions, neural networks with a large number of layers and neurons can approximate more complex functions.
- As the number of neurons increases, the representation improves, as shown in the figure, but there is a risk of overfitting.
 -->

<!-- <img src="./images/dl15.png" width=800 height=auto style="background-color:white;" align="center"/> -->



The most popular activation functions are:

<img src="./images/activationFunctions.pbm" align="center"/>

<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

Toy Example:

<img src="./images/dl12.png" width=800 height=auto style="background-color:white;" align="center"/>


<img src="./images/dl13.png" width=800 height=auto style="background-color:white;" align="center"/>

## **Loss Function in Deep Learning**

- In deep learning, a loss function is a mathematical function that quantifies the **difference** between the **predicted** output and the **actual** output.
  - It measures how well the neural network is performing its task

- Common Loss Functions
  -  Mean Squared Error (MSE)
     -  The mean squared error loss function is used for regression problems, where the goal is to predict a continuous value. It calculates the average of the squared differences between the predicted and actual values.
  -  Binary Cross-Entropy
     -  The binary cross-entropy loss function is used for binary classification problems, where the output can take only two values (0 or 1). It measures the difference between the predicted probability and the true label.

<img src="https://raw.githubusercontent.com/wsko/hands-on-gen-ai-2/main/images/dl22.png" width=800 height=auto style="background-color:white;" align="center"/>


<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Goal of Training NNs**

<img src="./images/dl23.png" width=800 height=auto style="background-color:white;" align="center"/>


<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Most Popular Frameworks**

Here are some of the most popular deep learning frameworks:

- **Tensorflow 1.x:** This was one of the first widely available frameworks developed by Google. It allowed users to define a static computation graph, push it to the GPU, and explicitly evaluate it.

- **PyTorch:** Developed by Facebook, PyTorch has been growing in popularity. It offers a flexible and dynamic approach to building neural networks.

- **Keras:** Keras is a higher-level API that sits on top of both Tensorflow and PyTorch. It was created by Francois Chollet to unify and simplify the process of using neural networks.

- **Tensorflow 2.x + Keras:** This is a new version of Tensorflow that integrates Keras functionality. It supports dynamic computation graphs, making tensor operations similar to those in NumPy and PyTorch.

In this notebook, we will focus on using PyTorch. Make sure you have the latest version of PyTorch installed by following the [instructions on their website](https://pytorch.org/get-started/locally/). Typically, installation is as simple as running one of the following commands:



In [None]:
# !%pip install torch torchvision

In [None]:
'''#Or

%conda install pytorch -c pytorch

#This will ensure you have the necessary packages to work with PyTorch.
'''

In [None]:
import torch
torch.__version__

## Basic Concepts: Tensor

**Tensor** is a multi-dimensional array. It is very convenient to use tensors to represent different types of data:
* 400x400 - black-and-white picture
* 400x400x3 - color picture
* 16x400x400x3 - minibatch of 16 color pictures
* 25x400x400x3 - one second of 25-fps video
* 8x25x400x400x3 - minibatch of 8 1-second videos

### Simple Tensors

In [None]:
a = torch.tensor([[1,2],[3,4]])
print(a)
a = torch.randn(size=(10,3,2))
print(a)

In [None]:
print(a-a[0])
print(torch.exp(a)[0].numpy())

## In-place and out-of-place Operations

- Tensor operations such as `+`/`add` return new tensors.
- Sometimes, you need to modify the existing tensor in-place.
- Many operations have in-place counterparts, which end with `_`.

Example:

In [None]:
u = torch.tensor(5)
print("Result when adding out-of-place:",u.add(torch.tensor(3)))
u.add_(torch.tensor(3))
print("Result after adding in-place:", u)

## Computing the Sum of All Rows in a Matrix (Naive Approach)


In [None]:
s = torch.zeros_like(a[0])
for i in a:
  s.add_(i)

print(s)

In [None]:
#much better way:
torch.sum(a,axis=0)

See more in the [official documentation](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html)

In [None]:
%matplotlib inline

<img src="./images/border.jpg" height="10" width="1500" align="center"/>

# Create a Simple Neural Network using Pytorch

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

# Step 1: Create a toy dataset
# Generate some synthetic data (X and y)
np.random.seed(42)
X_train = np.random.rand(100, 1)  # Training input features
y_train = 3 * X_train + 2 + 0.1 * np.random.randn(100, 1)  # Training labels with noise

X_test = np.random.rand(20, 1)  # Test input features
y_test = 3 * X_test + 2 + 0.1 * np.random.randn(20, 1)  # Test labels with noise

# Plot the training data
plt.scatter(X_train, y_train, label='Training Data', color='blue')

# Plot the test data
plt.scatter(X_test, y_test, label='Test Data', color='red')

# Add labels and a legend
plt.xlabel('X')
plt.ylabel('y')
plt.legend()

# Show the plot
plt.title('Scatter Plot of Toy Dataset')
plt.show()

In [None]:
# Convert data to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)


In [None]:
# Step 2: Define the neural network model
class SimpleLinearRegression(nn.Module):
   def __init__(self):
    # Constructor for the SimpleLinearRegression class.
    # It initializes the neural network architecture.
    super(SimpleLinearRegression, self).__init__()

    # Define a linear layer with input size 1 and output size 1.
    # This layer represents a simple linear regression model.
    self.linear = nn.Linear(1, 1)

   def forward(self, x):
        # This function represents the forward pass of the neural network.
        # It takes an input tensor 'x' and passes it through the linear layer.
        # The output of this linear layer is returned as the result.
        return self.linear(x)


In [None]:
# Create an instance of the model
model = SimpleLinearRegression()

# Step 3: Define loss function and optimizer
criterion = nn.MSELoss()  # Mean Squared Error loss
optimizer = optim.SGD(model.parameters(), lr=0.01)  # Stochastic Gradient Descent optimizer

<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Given output error, how to update Weights?**

You’re on the side of a hill and you need to get to the bottom

- It’s dark and you can’t see anything.
- You do have a torch, what do you do?
- You don’t have an accurate map

<img src="./images/gradient1.png" width=800 height=auto style="background-color:white;" align="center"/>

the function we’re trying to minimize is the neural network’s error


<img src="./images/gradient2.png" width=800 height=auto style="background-color:white;" align="center"/>

<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

In [None]:
# Step 4: Training loop
num_epochs = 1000
for epoch in range(num_epochs):
    # Forward pass for training data
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Training Loss: {loss.item():.4f}')



In [None]:
# Step 5: Make predictions on the test set and calculate MSE
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    test_loss = criterion(test_outputs, y_test_tensor)
    print(f'Test MSE: {test_loss.item():.4f}')


In [None]:
# Plot the raw data and model fit
plt.figure(figsize=(10, 5))

# Plot raw data
plt.subplot(1, 2, 1)
plt.scatter(X_train, y_train, label='Raw Data', color='blue')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Raw Data')
plt.legend()

# Plot model fit
plt.subplot(1, 2, 2)
plt.scatter(X_train, y_train, label='Raw Data', color='blue')
plt.plot(X_test, test_outputs.numpy(), label='Model Fit', color='red')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Model Fit')
plt.legend()

plt.tight_layout()
plt.show()


<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Excercise**

Let's play with the code.
- Try another loss function
    <!-- huber_loss = nn.SmoothL1Loss()  # Huber loss -->

- What if I want to use adam optimizer?
<!-- optimizer = optim.Adam(model.parameters(), lr=0.001)  # Use Adam optimizer with a smaller learning rate -->

- How the number of epochs might change the performance?
  <!-- 200? -->

<img src="https://github.com/wsko/Generative_AI/blob/main/Day-1/images/border.jpg?raw=1" height="10" width="1500" align="center"/>

## **Performing Computations on GPU with PyTorch**

Accelerating deep learning computations is often achieved by utilizing powerful Graphics Processing Units (GPUs). PyTorch simplifies GPU computing with straightforward steps.

1. Define the target device for computations at the beginning of your code. Choose either `"cpu"` for CPU or `"cuda"` for GPU. This sets the device for tensor operations.

2. Move tensors to the specified device using the `.to(device)` method. This can be done for existing tensors or during tensor creation.


In [None]:
# Check if a GPU is available and set the device accordingly
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


# Convert data to PyTorch tensors and move them to the GPU
X_train_tensor = torch.tensor(X_train, dtype=torch.float32).to(device)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).to(device)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32).to(device)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).to(device)

# Step 2: Define a neural network model and move it to the GPU
class GPUModel(nn.Module):
    def __init__(self):
        super(GPUModel, self).__init__()
        self.linear = nn.Linear(1, 1)  # Input size: 1, Output size: 1

    def forward(self, x):
        return self.linear(x)

# Create an instance of the model and move it to the GPU
model = GPUModel().to(device)

# Step 3: Use a different optimizer with adaptive learning rate
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Use Adam optimizer with a smaller learning rate

# Step 4: Train the model with more epochs
num_epochs = 2000  # Increase the number of training epochs
for epoch in range(num_epochs):
    # Forward pass for training data
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 200 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Training Loss: {loss.item():.4f}')

# Step 5: Calculate the Huber loss on the test set
huber_loss = nn.SmoothL1Loss()  # Huber loss
with torch.no_grad():
    test_outputs = model(X_test_tensor)
    test_loss = huber_loss(test_outputs, y_test_tensor)
    print(f'Test Huber Loss: {test_loss.item():.4f}')

# Train a Simple Feedforward Neural Network

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D

# Step 1: Create a toy dataset
# Generate synthetic data for binary classification
np.random.seed(42)
X = np.random.rand(100, 2)  # Input features (100 samples with 2 features)
y = (X[:, 0] + X[:, 1] > 1).astype(int)  # Binary labels based on a simple rule

# Plot the synthetic data
plt.figure(figsize=(8, 6))
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], label='Class 0', color='blue')
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], label='Class 1', color='red')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Synthetic Data for Binary Classification')
plt.legend()
plt.grid(True)

# Convert data to PyTorch tensors
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.long)  # Use LongTensor for classification labels


In [None]:
# Step 2: Define a Feedforward Neural Network model
class FeedForwardNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(FeedForwardNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)  # Input layer to hidden layer
        self.relu = nn.ReLU()  # ReLU activation function
        self.fc2 = nn.Linear(hidden_size, num_classes)  # Hidden layer to output layer

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create an instance of the Feedforward Neural Network model
input_size = 2  # Number of input features
hidden_size = 4  # Number of neurons in the hidden layer
num_classes = 2  # Number of output classes for binary classification
model = FeedForwardNN(input_size, hidden_size, num_classes)


In [None]:
# Step 3: Define loss function and optimizer
criterion = nn.CrossEntropyLoss()  # Cross-Entropy loss for classification
optimizer = optim.SGD(model.parameters(), lr=0.1)  # Stochastic Gradient Descent optimizer


In [None]:
# Step 4: Training loop
num_epochs = 1000
losses = []  # Store loss values for plotting
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_tensor)
    loss = criterion(outputs, y_tensor)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

    losses.append(loss.item())  # Store loss for plotting

# Plot the loss curve
plt.figure(figsize=(8, 6))
plt.plot(range(1, num_epochs + 1), losses)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Curve')
plt.grid(True)
plt.show()

In [None]:
# Generate synthetic data for evaluation
np.random.seed(42)
X_test = np.random.rand(50, 2)  # Input features (100 samples with 2 features)
y_test = (X_test[:, 0] + X_test[:, 1] > 1).astype(int)  # Binary labels based on a simple rule
# Convert data to PyTorch tensors
X_testtensor = torch.tensor(X_test, dtype=torch.float32)
y_testtensor = torch.tensor(y_test, dtype=torch.long)  # Use LongTensor for classification labels


In [None]:
# Step 5: Evaluate the model
with torch.no_grad():
    # Generate predictions for the entire dataset
    all_predictions = model(X_testtensor)
    predicted_classes = torch.argmax(all_predictions, dim=1).numpy()

    # Calculate performance metrics
    accuracy = accuracy_score(y_test, predicted_classes)
    precision = precision_score(y_test, predicted_classes)
    recall = recall_score(y_test, predicted_classes)
    f1 = f1_score(y_test, predicted_classes)

    # Calculate the confusion matrix
    cm = confusion_matrix(y_test, predicted_classes)

    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')

    # Plot the confusion matrix
    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
                xticklabels=['Class 0', 'Class 1'], yticklabels=['Class 0', 'Class 1'])
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title('Confusion Matrix')
    plt.show()


# Train a Simple Feedforward Neural Network - using Tensorflow and Keras

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import seaborn as sns
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.losses import SparseCategoricalCrossentropy



# Step 2: Define a Feedforward Neural Network model
model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))  # Input layer to hidden layer
model.add(Dense(2, activation='softmax'))  # Hidden layer to output layer



In [None]:
# Step 3: Compile the model
model.compile(optimizer=SGD(learning_rate=0.1),
              loss=SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# Step 4: Training the model
num_epochs = 1000
history = model.fit(X, y, epochs=num_epochs, verbose=0)

# Plot the loss curve
plt.figure(figsize=(8, 6))
plt.plot(history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Curve')
plt.grid(True)
plt.show()


In [None]:

# Step 5: Generate synthetic data for evaluation
np.random.seed(42)
X_test = np.random.rand(50, 2)  # Input features (50 samples with 2 features)
y_test = (X_test[:, 0] + X_test[:, 1] > 1).astype(int)  # Binary labels based on a simple rule

# Evaluate the model
predicted_classes = np.argmax(model.predict(X_test), axis=1)

# Calculate performance metrics
accuracy = accuracy_score(y_test, predicted_classes)
precision = precision_score(y_test, predicted_classes)
recall = recall_score(y_test, predicted_classes)
f1 = f1_score(y_test, predicted_classes)

# Calculate the confusion matrix
cm = confusion_matrix(y_test, predicted_classes)

print(f'Accuracy: {accuracy:.4f}')
print(f'Precision: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1 Score: {f1:.4f}')

# Plot the confusion matrix
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['Class 0', 'Class 1'], yticklabels=['Class 0', 'Class 1'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()


# Multi-Layered Neural Network: Deep Learning

In [None]:
# Define a Multi-Layered Neural Network model
class MultiLayerNN(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, num_classes):
        super(MultiLayerNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size1)  # Input layer to hidden layer 1
        self.relu1 = nn.ReLU()  # ReLU activation for hidden layer 1
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)  # Hidden layer 1 to hidden layer 2
        self.relu2 = nn.ReLU()  # ReLU activation for hidden layer 2
        self.fc3 = nn.Linear(hidden_size2, num_classes)  # Hidden layer 2 to output layer

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        return x

# Create an instance of the Multi-Layered Neural Network model
input_size = 2  # Number of input features
hidden_size1 = 8  # Number of neurons in hidden layer 1
hidden_size2 = 4  # Number of neurons in hidden layer 2
num_classes = 2  # Number of output classes for binary classification
model = MultiLayerNN(input_size, hidden_size1, hidden_size2, num_classes)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()  # Cross-Entropy loss for classification
optimizer = optim.SGD(model.parameters(), lr=0.1)  # Stochastic Gradient Descent optimizer

# Training loop
num_epochs = 1000
losses = []  # Store loss values for plotting
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_tensor)
    loss = criterion(outputs, y_tensor)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

    losses.append(loss.item())  # Store loss for plotting

# Plot the loss curve
plt.figure(figsize=(8, 6))
plt.plot(range(1, num_epochs + 1), losses)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Curve')
plt.grid(True)




In [None]:
# Step Evaluate the model
with torch.no_grad():
    # Generate predictions for the entire dataset
    all_predictions = model(X_tensor)
    predicted_classes = torch.argmax(all_predictions, dim=1).numpy()

    # Calculate performance metrics
    accuracy = accuracy_score(y, predicted_classes)
    precision = precision_score(y, predicted_classes)
    recall = recall_score(y, predicted_classes)
    f1 = f1_score(y, predicted_classes)

    # Calculate the confusion matrix
    cm = confusion_matrix(y, predicted_classes)

    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')

    # Plot the confusion matrix
    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False,
                xticklabels=['Class 0', 'Class 1'], yticklabels=['Class 0', 'Class 1'])
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.title('Confusion Matrix')
    plt.show()

<img src="./images/border.jpg" height="10" width="1500" align="center"/>


# **LAB**: hand-written image recognition


- https://en.wikipedia.org/wiki/MNIST_database


<img src="./images/MNIST_dataset.png"  width = "800" align="center"/>

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt



# Transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

# Load the dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Visualizing 10 training images in one row with labels
def imshow(img):
    img = img / 2 + 0.5  # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.xticks([])  # Remove x tick marks
    plt.yticks([])  # Remove y tick marks
    plt.show()

# Get some random training images
dataiter = iter(trainloader)
images, labels = next(dataiter)

# Select 10 images
images = images[:10]
labels = labels[:10]

# Show images
imshow(torchvision.utils.make_grid(images, nrow=10))

# Print labels
print(' '.join(f'{labels[j].item()}' for j in range(10)))

In [None]:
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

# Define the neural network
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)  #reshape the input
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

# Training the network
for epoch in range(5):  # loop over the dataset multiple times
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        if i % 100 == 99:  # print every 100 mini-batches
            print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
            running_loss = 0.0

print('Finished Training')



In [None]:
# Evaluating the network on the test set
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        c = (predicted == labels).squeeze()
        for i in range(16):  # batch size is 16
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1

for i in range(10):
    print(f'Accuracy of class {i}: {100 * class_correct[i] / class_total[i]:.2f}%')


## TensorFlow solution

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import numpy as np

# Load and preprocess the dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Visualize 10 random images from the training set
def plot_images(images, labels):
    plt.figure(figsize=(10, 1))
    for i in range(10):
        plt.subplot(1, 10, i+1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(images[i], cmap=plt.cm.binary)
        plt.xlabel(labels[i])
    plt.show()

# Select 10 random images
indices = np.random.choice(range(len(x_train)), 10)
selected_images = x_train[indices]
selected_labels = y_train[indices]

plot_images(selected_images, selected_labels)

# Build the neural network
model = models.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(512, activation='relu'),
    layers.Dense(256, activation='relu'),
    layers.Dense(10)
])

# Compile the model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Train the network
model.fit(x_train, y_train, epochs=5)

# Evaluate the network
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print('\nTest accuracy:', test_acc)

# Predict and evaluate on test set
predictions = model.predict(x_test)
predicted_labels = np.argmax(predictions, axis=1)

# Calculate accuracy for each class
class_correct = [0] * 10
class_total = [0] * 10

for i in range(len(y_test)):
    label = y_test[i]
    if predicted_labels[i] == label:
        class_correct[label] += 1
    class_total[label] += 1

for i in range(10):
    print(f'Accuracy of class {i}: {100 * class_correct[i] / class_total[i]:.2f}%')
