# Notebook for Training the Model (Video Anomaly Detection)
---

In this notebook we try to train our model with the embeddings that we got from previous notebook.

## Installing Necessary Libraries

Install Simple Recurrent Units in Kaggle. Make sure to install the `sru==3.0.0.dev6` version. This is to experiment with the SRUPP model. Also, you can install the current stable release `sru==2.6.0`

**Make sure the Internet is ON**

In [2]:
!pip install sru==3.0.0.dev6

Collecting sru==3.0.0.dev6
  Downloading sru-3.0.0.dev6-py3-none-any.whl (30 kB)
Installing collected packages: sru
Successfully installed sru-3.0.0.dev6


## Importing necessary Libraries

In this section we import the libraries that we need. We are using torch framework to train our model. We are also using Simple Recurrent Unit (SRU). We also import some utility functions to view real time progression

In [3]:
import logging
import os

import numpy as np
import torch
from sklearn.model_selection import train_test_split
from sru import SRU
from torch.nn import CrossEntropyLoss
from torch.nn import Module, Dropout, Linear
from torch.optim import Adam
from torch.utils.data import TensorDataset, DataLoader
from tqdm import tqdm

## Selecting Computing Device

Automatically selects the best hardware accelerator that is available on the system

In [4]:
# Set the device to use (e.g., 'cpu', 'cuda', 'mps')
device = torch.device("mps" if torch.backends.mps.is_available() else
                      ("cuda" if torch.cuda.is_available() else "cpu"))

# Select Device According to Availability
print("Device selected:", device)

# If the device is CUDA, print the device capability
if device.type == "cuda":
    os.system("nvidia-smi")
    print()
    print("Device type:", device.type)
    print("Capability:", torch.cuda.get_device_capability(device))
else:
    print("Device capabilities are limited on MPSs and CPUs.")

Device selected: cuda
Wed Feb 21 14:54:33 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla P100-PCIE-16GB           Off | 00000000:00:04.0 Off |                    0 |
| N/A   30C    P0              26W / 250W |      2MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                              

## Prepare the Dataset

Here we prepare our training and testing dataset. We do the following things:

- Load the files from Kaggle Input and check if the file has correct shapes and sizes
- Split the dataset in Training and Testing
- Create Dataloader for model

### Loading and Checking the saved npy files

Our files can be found in 

- video_embeddings - `/kaggle/input/embeddings-v1/embeddings.npy`
- labels - `/kaggle/input/embeddings-v1/labels.npy`

Load them with `np.load()` function and check if they match the lengths and shapes

In [46]:
file_embeddings = np.load('/kaggle/input/embeddings-v1/embeddings.npy')
file_labels = np.load('/kaggle/input/embeddings-v1/labels.npy')

# Check if the embeddings and labels are of the same length
if len(file_embeddings) != len(file_labels):
    raise ValueError("The length of the embeddings and labels should be the same")

# check if the embedding is a 4D array
if len(file_embeddings.shape) != 4:
    raise ValueError(f"The embeddings should be a 4D array [instances, windows, frames, features]."
                     f" Found {len(file_embeddings.shape)}D instead.")

print("Files Loaded Successfully")
print("Video Embeddings Shape:", file_embeddings.shape)
print("Video Labels Shape:", file_labels.shape)

Files Loaded Successfully
Video Embeddings Shape: (3636, 4, 24, 1024)
Video Labels Shape: (3636,)


### Train-Test Splitting

We use typical `train_test_split()` function from scikit-learn library to split the dataset into `x_train`, `y_train`, `x_test`, `y_test`.

**But before that,**

We change the shape of video embeddings to 2D array. Initially it was 4D array but due to SRU model input compatibility, we changed the shape to 2D. Previously, the shape was `[videos, window, frames, features]` but we changed them to `[videos*window, frames*features]`. Which means, we take the total windows and all the features from our dataset.



In [44]:
test_size = 0.2

# Change the shape to fit the model (into 2d Array)
embeddings = file_embeddings.reshape(file_embeddings.shape[0] * file_embeddings.shape[1], -1)
labels = np.repeat(file_labels, 4)

# Split the data into training and testing
x_train, x_test, y_train, y_test = train_test_split(embeddings,
                                                    labels,
                                                    test_size=test_size,
                                                    random_state=42)

# convert to tensor
train_embeddings = torch.from_numpy(x_train).to(device)
train_labels = torch.from_numpy(y_train).to(device)
test_embeddings = torch.from_numpy(x_test).to(device)
test_labels = torch.from_numpy(y_test).to(device)

print('Shape of Train Embeddings:', train_embeddings.shape)
print('Shape of Train Labels:', train_labels.shape)
print('Shape of Test Embeddings:', test_embeddings.shape)
print('Shape of Test Labels:', test_labels.shape)

Shape of Train Embeddings: torch.Size([11635, 24576])
Shape of Train Labels: torch.Size([11635])
Shape of Test Embeddings: torch.Size([2909, 24576])
Shape of Test Labels: torch.Size([2909])


### Creating TensorDataset and Dataloader

To feed data into our model for training, we made a Dataloader. First we converted them into `TensorDataset` object, then with them, we instantiated `DataLoader` objects

In [17]:
batch_size = 8

# Create TensorDataset
train_data = TensorDataset(train_embeddings, train_labels)
test_data = TensorDataset(test_embeddings, test_labels)

train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

## Building the Model

We build our model with SRU, Dropout Layer and Fully Connected Layer. Model Structure is given below:

    SRUModel(
      (sru_layers): SRU(
        (rnn_lst): ModuleList(
          (0): SRUCell(24576, 1024, rescale=True,
            transform_module=Linear(in_features=24576, out_features=4096, bias=False)
          )
          (1): SRUCell(1024, 1024, rescale=True,
            transform_module=Linear(in_features=1024, out_features=3072, bias=False)
          )
        )
      )
      (dropout): Dropout(p=0.2, inplace=False)
      (linear): Linear(in_features=1024, out_features=2, bias=True)
    )

### Model Structure and Initialization

In [48]:
class SRUModel(Module):
    def __init__(self, input_size, hidden_size, **kwargs):
        super(SRUModel, self).__init__()
        # Main SRU layer
        self.sru_layers = SRU(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=kwargs.get('num_layers', 2),
            dropout=kwargs.get('dropout_prob', 0.0),
            bidirectional=kwargs.get('bidirectional', False),
            layer_norm=kwargs.get('layer_norm', False),
            highway_bias=kwargs.get('highway_bias', 0.0),
            rescale=kwargs.get('rescale', True),
            nn_rnn_compatible_return=kwargs.get('nn_rnn_compatible_return', False),
            proj_input_to_hidden_first=kwargs.get('proj_input_to_hidden_first', False),
            amp_recurrence_fp16=kwargs.get('amp_recurrence_fp16', False),
            normalize_after=kwargs.get('normalize_after', False),
        ).to(device)
        # Dropout layer
        self.dropout = Dropout(kwargs.get('dropout_layer_prob', 0.2)).to(device)
        # Linear layer (Fully connected layer)
        self.linear = Linear(
            in_features=hidden_size * 2 if kwargs.get('bidirectional', False) else hidden_size,
            out_features=kwargs.get('num_classes', 2)
        ).to(device)
        # L2 regularization
        self.l2_reg_lambda = kwargs.get('l2_reg_lambda', 1e-5)

    def forward(self, x):
        output_states, _ = self.sru_layers(x)
        output = self.linear(self.dropout(output_states[-1]))
        return output

    def l2_regularization(self):
        l2_reg = torch.tensor(0., device=device)
        for param in self.parameters():
            l2_reg += torch.norm(param, p=2)
        return self.l2_reg_lambda * l2_reg


model = SRUModel(24576, 1024)
model = model.to(device)

### Define Loss function and Optimizer

In [36]:
# Define your loss function and optimizer
criterion = CrossEntropyLoss()
optimizer = Adam(model.parameters())

## Training the model

In [40]:
epochs = 10

# Setup logging
logging.basicConfig(filename='training.log', level=logging.INFO)

for epoch in range(epochs):
    total_correct = 0
    total_samples = 0
    total_loss = 0.0

    # Create tqdm progress bar for training loader
    progress_bar = tqdm(enumerate(train_loader), desc=f"Epoch {epoch + 1}/{epochs}", total=len(train_loader))

    for i, (videos, labels) in progress_bar:
        videos = videos.unsqueeze(0)
        # Forward pass
        outputs = model(videos)
        labels = labels.long()  # Convert labels to Long type
        loss = criterion(outputs, labels) + model.l2_regularization()  # calculates loss
        total_loss += loss.item()
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        # Calculate accuracy per batch
        _, predicted = torch.max(outputs.data, 1)
        total_samples += labels.size(0)
        total_correct += (predicted == labels).sum().item()
        batch_accuracy = 100 * total_correct / total_samples

        # Update progress bar
        progress_bar.set_postfix(loss=loss.item(), accuracy=batch_accuracy)

    # Log epoch statistics
    epoch_loss = total_loss / len(train_loader)
    epoch_accuracy = 100 * total_correct / total_samples
    logging.info(f'Epoch [{epoch + 1}/{epochs}], Loss: {epoch_loss}, Accuracy: {epoch_accuracy}%')

    print(f'Epoch [{epoch + 1}/{epochs}], Loss: {epoch_loss}, Accuracy: {epoch_accuracy}%')

# Close logging
logging.shutdown()

Epoch 1/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.29it/s, accuracy=96.5, loss=0.00713]


Epoch [1/10], Loss: 0.10224267991349306, Accuracy: 96.46755479157714%


Epoch 2/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.28it/s, accuracy=99.1, loss=0.00686]


Epoch [2/10], Loss: 0.037275684418163146, Accuracy: 99.07176622260421%


Epoch 3/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.27it/s, accuracy=99.3, loss=0.00701]


Epoch [3/10], Loss: 0.03468061937019229, Accuracy: 99.26944563816072%


Epoch 4/10: 100%|██████████| 1455/1455 [00:38<00:00, 37.32it/s, accuracy=99.4, loss=0.00786]


Epoch [4/10], Loss: 0.025703284755682003, Accuracy: 99.41555651052857%


Epoch 5/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.26it/s, accuracy=99.5, loss=0.00885]


Epoch [5/10], Loss: 0.02340372176776893, Accuracy: 99.47571981091534%


Epoch 6/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.26it/s, accuracy=99.3, loss=0.0238] 


Epoch [6/10], Loss: 0.030376053315705757, Accuracy: 99.33820369574559%


Epoch 7/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.28it/s, accuracy=99.4, loss=0.00769]


Epoch [7/10], Loss: 0.030124324103465296, Accuracy: 99.41555651052857%


Epoch 8/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.30it/s, accuracy=99.8, loss=0.00766]


Epoch [8/10], Loss: 0.013608307863787278, Accuracy: 99.76794155565105%


Epoch 9/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.27it/s, accuracy=99.3, loss=0.00777]


Epoch [9/10], Loss: 0.03598142198776615, Accuracy: 99.30382466695316%


Epoch 10/10: 100%|██████████| 1455/1455 [00:39<00:00, 37.24it/s, accuracy=99.6, loss=0.00783]


Epoch [10/10], Loss: 0.022069227149153187, Accuracy: 99.57885689729265%


## Evaluating the model

In [42]:
model.eval()

correct = 0
total = 0
with torch.no_grad():
    for videos, labels in tqdm(test_loader, desc="Evaluating the Model"):
        videos = videos.unsqueeze(0)
        outputs = model(videos)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

test_accuracy = 100 * correct / total
print(f'Test Accuracy of the model on the test dataset: {test_accuracy}%')

Evaluating the Model: 100%|██████████| 364/364 [00:00<00:00, 608.82it/s]

Test Accuracy of the model on the test dataset: 99.55311103471983%





## Saving the model

In [43]:
torch.save(model.state_dict(), 'anomaly_detection_model.pt')