<a href="https://colab.research.google.com/github/tamojeetroychowdhury/Facial-Expression-Recognition/blob/main/Week-2/FER_Neural_Net_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


### **Getting Data**
I used the FER-2013 dataset, downloaded from Kaggle at [this link](https://www.kaggle.com/datasets/deadskull7/fer2013?resource=download). It has 7 different emotions and over 35,000 images. The dataset was available in a nice CSV format directly.

In [2]:
import numpy as np 
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

df = pd.read_csv('gdrive/My Drive/fer2013.csv')

The emotion of disgust corresponding to label '1' had very few samples (about 500) which would be difficult to work with, so that row is dropped.

In [3]:
df = df.drop(df[df.emotion == 1].index)

In [37]:
all_labels = np.array(df['emotion'][:35000])
all_labels[np.where(all_labels == 6)] = 1   
all_labels = torch.tensor(all_labels)
#labels_wo_1 = all_labels[np.where(all_labels != 1)]

In [36]:
len(all_labels)

35000

The two cells below just convert the pixel data from string format to a 48 $\times$ 48 matrix format, and creates a train of those in a single colour channel (greyscale), for a total of 35000 images. Hence the dimensions of 35000 $\times$ 1 $\times$ 48 $\times$ 48.

In [5]:
ds = []
for s in df['pixels'][:35000]:
    m = s.split()
    l = np.zeros((48,48))
    for i in range(len(m)):
        n = int(m[i])
        l[i//48][i%48] = n
    ds.append([l])

In [6]:
ds = np.array(ds, dtype = np.float32)
dataset = torch.tensor(ds)
#dataset.view(10000,1,48,48)
dataset.shape

torch.Size([35000, 1, 48, 48])

In [7]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

### **Convolutional Neural network**
The one implemented below is a very basic one with two convolutional layers and two linear layers.  
I played around with the number of output channels in the convolutional layers slighly (very large numbers ate up lots of RAM) and 20 and 30 seemed to work best.

In [69]:
input_size = 2304
hidden_size = 500 
num_classes = 10
num_epochs = 3
batch_size = 2000
learning_rate = 0.001

class NeuralNet(nn.Module):
    def __init__(self):
        super(NeuralNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(20, 30, 5)
        self.fc1 = nn.Linear(30 * 9 * 9, 90)
        self.fc3 = nn.Linear(90, 6)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x))) 
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 30 * 9 * 9)            
        x = F.relu(self.fc1(x))                             
        x = self.fc3(x)                       
        return x

In [56]:
model = NeuralNet().to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Ignore num_epochs, I ran the cell repeatedly with different values of num_epochs and checked for accuracy after each run. The training loop ran for a total of 40 times.  
Accuracy on test dataset increased from 11\% (which is worse than a random classifier!) to 45.7\% (still very bad, but I hope to be able to increase that with tweaks in the neural network).

In [74]:
batch_size = 5000
num_epochs = 4

for epoch in range(num_epochs):
    for i in range(0,30000,batch_size):
        images = dataset[i:i + batch_size] 
        labels = all_labels[i:i + batch_size] 
        # origin shape: [100, 1, 28, 28]
        # resized: [100, 784]
        #images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        #if (i+1) % 100 == 0:
        print (f'Epoch [{epoch+1}/{num_epochs}], Step [{int(i/batch_size + 1)}], Loss: {loss.item():.4f}')

Epoch [1/4], Step [1], Loss: 1.0757
Epoch [1/4], Step [2], Loss: 1.0265
Epoch [1/4], Step [3], Loss: 1.0685
Epoch [1/4], Step [4], Loss: 1.0394
Epoch [1/4], Step [5], Loss: 1.0217
Epoch [1/4], Step [6], Loss: 1.0150
Epoch [2/4], Step [1], Loss: 1.0652
Epoch [2/4], Step [2], Loss: 1.0299
Epoch [2/4], Step [3], Loss: 1.0576
Epoch [2/4], Step [4], Loss: 1.0353
Epoch [2/4], Step [5], Loss: 1.0062
Epoch [2/4], Step [6], Loss: 1.0135
Epoch [3/4], Step [1], Loss: 1.0633
Epoch [3/4], Step [2], Loss: 1.0153
Epoch [3/4], Step [3], Loss: 1.0562
Epoch [3/4], Step [4], Loss: 1.0331
Epoch [3/4], Step [5], Loss: 1.0273
Epoch [3/4], Step [6], Loss: 0.9879
Epoch [4/4], Step [1], Loss: 1.0377
Epoch [4/4], Step [2], Loss: 1.0176
Epoch [4/4], Step [3], Loss: 1.0499
Epoch [4/4], Step [4], Loss: 1.0167
Epoch [4/4], Step [5], Loss: 0.9961
Epoch [4/4], Step [6], Loss: 0.9983


In [78]:
with torch.no_grad():
    n_correct = 0
    n_samples = 0
    for i in range(30000, 35000, batch_size):
        #images = images.reshape(-1, 28*28).to(device)
        images = dataset[i:i + batch_size] 
        labels = all_labels[i:i + batch_size] 
        outputs = model(images)
        # max returns (value ,index)
        _, predicted = torch.max(outputs.data, 1)
        n_samples += labels.size(0)
        n_correct += (predicted == labels).sum().item()

    acc = 100.0 * n_correct / n_samples
    print(f'Accuracy of the network on the 5000 test images: {acc} %')

Accuracy of the network on the 5000 test images: 45.72 %


Total epochs done = 40  
Final accuracy = 45.72\%