# Action Recognition @ UCF101  
**Due date: 11:59 pm on Nov. 19, 2019 (Tuesday)**

## Description
---
In this homework, you will be doing action recognition using Recurrent Neural Network (RNN), (Long-Short Term Memory) LSTM in particular. You will be given a dataset called UCF101, which consists of 101 different actions/classes and for each action, there will be 145 samples. We tagged each sample into either training or testing. Each sample is supposed to be a short video, but we sampled 25 frames from each videos to reduce the amount of data. Consequently, a training sample is an image tuple that forms a 3D volume with one dimension encoding *temporal correlation* between frames and a label indicating what action it is.

To tackle this problem, we aim to build a neural network that can not only capture spatial information of each frame but also temporal information between frames. Fortunately, you don't have to do this on your own. RNN — a type of neural network designed to deal with time-series data — is right here for you to use. In particular, you will be using LSTM for this task.

Instead of training an end-to-end neural network from scratch whose computation is prohibitively expensive, we divide this into two steps: feature extraction and modelling. Below are the things you need to implement for this homework:
- **{35 pts} Feature extraction**. Use any of the [pre-trained models](https://pytorch.org/docs/stable/torchvision/models.html) to extract features from each frame. Specifically, we recommend not to use the activations of the last layer as the features tend to be task specific towards the end of the network. 
    **hints**: 
    - A good starting point would be to use a pre-trained VGG16 network, we suggest first fully connected layer `torchvision.models.vgg16` (4096 dim) as features of each video frame. This will result into a 4096x25 matrix for each video. 
    - Normalize your images using `torchvision.transforms` 
    ```
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    prep = transforms.Compose([ transforms.ToTensor(), normalize ])
    prep(img)
    The mean and std. mentioned above is specific to Imagenet data
    
    ```
    More details of image preprocessing in PyTorch can be found at http://pytorch.org/tutorials/beginner/data_loading_tutorial.html
    
- **{35 pts} Modelling**. With the extracted features, build an LSTM network which takes a **dx25** sample as input (where **d** is the dimension of the extracted feature for each frame), and outputs the action label of that sample.
- **{20 pts} Evaluation**. After training your network, you need to evaluate your model with the testing data by computing the prediction accuracy **(5 points)**. The baseline test accuracy for this data is 75%, and **10 points** out of 20 is for achieving test accuracy greater than the baseline. Moreover, you need to compare **(5 points)** the result of your network with that of support vector machine (SVM) (stacking the **dx25** feature matrix to a long vector and train a SVM).
- **{10 pts} Report**. Details regarding the report can be found in the submission section below.

Notice that the size of the raw images is 256x340, whereas your pre-trained model might take **nxn** images as inputs. To solve this problem, instead of resizing the images which unfavorably changes the spatial ratio, we take a better solution: Cropping five **nxn** images, one at the image center and four at the corners and compute the **d**-dim features for each of them, and average these five **d**-dim feature to get a final feature representation for the raw image.
For example, VGG takes 224x224 images as inputs, so we take the five 224x224 croppings of the image, compute 4096-dim VGG features for each of them, and then take the mean of these five 4096-dim vectors to be the representation of the image.

In order to save you computational time, you need to do the classification task only for **the first 25** classes of the whole dataset. The same applies to those who have access to GPUs. **Bonus 10 points for running and reporting on the entire 101 classes.**


## Dataset
Download **dataset** at [UCF101](http://vision.cs.stonybrook.edu/~yangwang/public/UCF101_images.tar)(Image data for each video) and the **annos folder** which has the video labels and the label to class name mapping is included in the assignment folder uploaded. 


UCF101 dataset contains 101 actions and 13,320 videos in total.  

+ `annos/actions.txt`  
  + lists all the actions (`ApplyEyeMakeup`, .., `YoYo`)   
  
+ `annots/videos_labels_subsets.txt`  
  + lists all the videos (`v_000001`, .., `v_013320`)  
  + labels (`1`, .., `101`)  
  + subsets (`1` for train, `2` for test)  

+ `images/`  
  + each folder represents a video
  + the video/folder name to class mapping can be found using `annots/videos_labels_subsets.txt`, for e.g. `v_000001` belongs to class 1 i.e. `ApplyEyeMakeup`
  + each video folder contains 25 frames  



## Some Tutorials
- Good materials for understanding RNN and LSTM
    - http://blog.echen.me
    - http://karpathy.github.io/2015/05/21/rnn-effectiveness/
    - http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- Implementing RNN and LSTM with PyTorch
    - [LSTM with PyTorch](http://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#sphx-glr-beginner-nlp-sequence-models-tutorial-py)
    - [RNN with PyTorch](http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html)

In [0]:
# import packages here
import cv2
import numpy as np
import matplotlib.pyplot as plt
import glob
import pandas as pd
import random 
import time
import pickle
import torch
import torch.utils.data as DD
import torchvision
import torchvision.transforms as T
import warnings
# import cPickle

warnings.filterwarnings("ignore")
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
from PIL import Image
from glob import glob
import os
from scipy import io
from scipy.io import savemat, loadmat
import matplotlib.pyplot as plt

---
---
## **Problem 1.** Feature extraction

In [0]:
# Mount your google drive where you've saved your assignment folder
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
cd '/content/gdrive/My Drive/CSE527 Computer Vision/Garg_Utkarsh_112672834_hw5'

/content/gdrive/My Drive/CSE527 Computer Vision/Garg_Utkarsh_112672834_hw5


In [0]:
# \*write your codes for feature extraction (You can use multiple cells, this is just a place holder)
# !wget 'http://vision.cs.stonybrook.edu/~yangwang/public/UCF101_images.tar'


--2019-11-16 00:30:07--  http://vision.cs.stonybrook.edu/~yangwang/public/UCF101_images.tar
Resolving vision.cs.stonybrook.edu (vision.cs.stonybrook.edu)... 130.245.4.232
Connecting to vision.cs.stonybrook.edu (vision.cs.stonybrook.edu)|130.245.4.232|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8658247680 (8.1G) [application/x-tar]
Saving to: ‘UCF101_images.tar’


2019-11-16 00:33:16 (43.6 MB/s) - ‘UCF101_images.tar’ saved [8658247680/8658247680]



In [0]:
# !tar -xkf './UCF101_images.tar' 2>/dev/null


In [0]:
torch.cuda.is_available()

True

In [0]:
import torchvision.models as models

vgg=models.vgg16(pretrained=True)
for param in vgg.parameters():
    param.requires_grad = False

new_classifier = nn.Sequential(*list(vgg.classifier.children())[:1])
vgg.classifier = new_classifier
print(vgg)

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/checkpoints/vgg16-397923af.pth
100%|██████████| 528M/528M [00:06<00:00, 80.1MB/s]


VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1

In [0]:
normalize = T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
prep = T.Compose([ T.ToTensor(), normalize ])

five_crop_transform = T.Compose([
   T.FiveCrop((224,224)),
   T.Lambda(lambda crops: torch.stack([prep(crop) for crop in crops])) 
])

In [0]:
#Feature Extraction of 25 class images: 

path='images/'
video_folders=os.listdir(path)
video_folders.sort()
subset_folders = video_folders[:3625]
output_folder = 'feature_data/'


for each_vid in subset_folders:     #Processing first 25 classes,i.e ~110*25 videos => 3625 images.
   frames = os.listdir(path+each_vid)
   frames.sort()
   vgg_all = []
   for img_name in frames:
       img = Image.open(path+each_vid+'/'+img_name)
       img_crops = five_crop_transform(img)
       ncrops, c, h, w = img_crops.size()
       five_cropped_img_features = []
       for cropped_img in img_crops:
           feat_i = vgg(cropped_img.unsqueeze(0))
           five_cropped_img_features.append(feat_i)
       five = torch.stack(five_cropped_img_features)
       mean_of_five_crops = five.mean(0).data
       vgg_all.append(mean_of_five_crops)
   vgg_for_vid = torch.stack(vgg_all)
   d = {}
   fe=vgg_for_vid.numpy()
   fe=fe.reshape((25, 4096))
   d['Feature'] = fe
  #  print(each_vid, fe.shape , '===========================')
   io.savemat(output_folder+each_vid.split('/')[-1]+'.mat', d)


In [0]:
# Using Map file to separate train and test data:

mapping = pd.read_csv('annos/videos_labels_subsets.txt',sep="\t", header=None)
mapping.columns = ["video_name", "class_label", "set"]
print(mapping.head())
data = mapping[mapping.class_label <= 25]
train = data[data.set==1]
test = data[data.set==2]
print(mapping.shape, data.shape, train.shape, test.shape)
xtrain=[]
ytrain=[]
xtest=[]
ytest=[]

for index, row in train.iterrows():
    vid_name = row['video_name']
    # print(vid_name)
    mat_data = loadmat('feature_data/'+vid_name+'.mat')['Feature']
    xtrain.append(mat_data)
    ytrain.append(row['class_label'])

for index, row in test.iterrows():
    vid_name = row['video_name']
    mat_data = loadmat('feature_data/'+vid_name+'.mat')['Feature']
    xtest.append(mat_data)
    ytest.append(row['class_label'])
print(len(xtrain), len(ytrain), len(xtest), len(ytest))

pickle.dump(np.stack(xtrain, axis=0), open('xtrain.p','wb'))
pickle.dump(np.stack(ytrain, axis=0), open('ytrain.p','wb'))
pickle.dump(np.stack(xtest, axis=0), open('xtest.p','wb'))
pickle.dump(np.stack(ytest, axis=0), open('ytest.p','wb'))

  video_name  class_label  set
0   v_000001            1    2
1   v_000002            1    2
2   v_000003            1    2
3   v_000004            1    2
4   v_000005            1    2
(13320, 3) (3360, 3) (2409, 3) (951, 3)
2409 2409 951 951


In [0]:
##Loading data from Pickle file.

xtrain = pickle.load(open('xtrain.p', 'rb'))
xtest = pickle.load(open('xtest.p','rb'))
ytrain = pickle.load(open('ytrain.p', 'rb'))
ytest = pickle.load(open('ytest.p','rb'))


In [0]:
#shuffle train and test
bundle = list(zip(xtrain, ytrain))
random.shuffle(bundle)
trD, trLb = zip(*bundle)

bundleTest = list(zip(xtest, ytest))
random.shuffle(bundleTest)
tstD, tstLb = zip(*bundleTest)

***
***
## **Problem 2.** Modelling

In [0]:
def get_batches(data,labels,batch_size=4):
  data_batches = []
  label_batches = []
  for i in (range(int(len(data)/batch_size))) :
    minibatch_d = np.zeros((0,25,4096))
    for each in data[i*batch_size: (i+1)*batch_size]:
      each=np.reshape(each,(25,4096))
      each=each[np.newaxis,:,:]
      minibatch_d = np.vstack((minibatch_d,each))
    
    data_batches.append(torch.from_numpy(minibatch_d))
    
    minibatch_l=np.array(labels[i*batch_size: (i+1)*batch_size],dtype=int)
    minibatch_l-=1
    label_batches.append(torch.LongTensor(minibatch_l))

  print(len(data_batches))
  print(data_batches[0].shape)

  print(len(label_batches))
  return data_batches,label_batches

In [0]:
trD_batches,trLb_batches=get_batches(trD,trLb)
tstD_batches,tstLb_batches=get_batches(tstD,tstLb)

In [0]:
!pip install torchnet

Collecting torchnet
  Downloading https://files.pythonhosted.org/packages/b7/b2/d7f70a85d3f6b0365517782632f150e3bbc2fb8e998cd69e27deba599aae/torchnet-0.0.4.tar.gz
Collecting visdom
[?25l  Downloading https://files.pythonhosted.org/packages/c9/75/e078f5a2e1df7e0d3044749089fc2823e62d029cc027ed8ae5d71fafcbdc/visdom-0.1.8.9.tar.gz (676kB)
[K     |████████████████████████████████| 686kB 7.8MB/s 
Collecting jsonpatch
  Downloading https://files.pythonhosted.org/packages/86/7e/035d19a73306278673039f0805b863be8798057cc1b4008b9c8c7d1d32a3/jsonpatch-1.24-py2.py3-none-any.whl
Collecting torchfile
  Downloading https://files.pythonhosted.org/packages/91/af/5b305f86f2d218091af657ddb53f984ecbd9518ca9fe8ef4103a007252c9/torchfile-0.1.0.tar.gz
Collecting websocket-client
[?25l  Downloading https://files.pythonhosted.org/packages/29/19/44753eab1fdb50770ac69605527e8859468f3c0fd7dc5a76dd9c4dbd7906/websocket_client-0.56.0-py2.py3-none-any.whl (200kB)
[K     |████████████████████████████████| 204kB 39.0

### Defining Helper Functions: (Accuracy / Classification Performance / Run_epoch)

In [0]:
def accuracy(model, dataset, labels):
    
    num_correct = 0
    num_samples = 0
    model.eval() 
    for i in range(len(dataset)):
        x_var = Variable(dataset[i].float())
        y_var = Variable(labels[i])
        scores = model(x_var)
        _, preds = scores.data.max(1)
        
        num_correct += (preds.cpu().numpy() == y_var.cpu().numpy()).sum()
        num_samples += preds.size(0)
    acc = float(num_correct) / num_samples
    print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

In [0]:
def classification_perf(model,dataloader):
    print('\nClass wise Accuracy Performance:\n')
    class_correct = list(0. for i in range(25))
    class_total = list(0. for i in range(25))
    with torch.no_grad():
        for i in range(len(dataloader)):
            images = Variable(dataloader[i][0].cuda())
            labels = Variable(dataloader[i][1].cuda().long())
            outputs=model(images)
            _, predicted = torch.max(outputs, 1)
            c = (predicted == labels).squeeze()
            for i in range(4):
                label = labels[i]
                class_correct[label] += c[i].item()
                class_total[label] += 1


    for i in range(25):
        print('Accuracy of %5s : %2d %%' % (
            class_names[i], 100 * class_correct[i] / class_total[i]))

In [0]:
import torchnet as tnt
num_class=25
def run_epoch(dataset, labels , model, criterion, epoch, tstD,tstLb, optimizer=None):
    
    confusion_matrix = tnt.meter.ConfusionMeter(num_class)
    acc = tnt.meter.ClassErrorMeter(accuracy=True)
    meter_loss = tnt.meter.AverageValueMeter()
    model.train()
    for i in range(len(dataset)):
        sequence = dataset[i]
        label = labels[i]
        input_sequence_var = Variable(sequence.float())
        input_label_var = Variable(label)

        # compute output
        output_logits = model(input_sequence_var)
        loss = criterion(output_logits, input_label_var)

        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        meter_loss.add(loss.data)
        acc.add(output_logits.data, input_label_var.data)
        confusion_matrix.add(output_logits.data, input_label_var.data)


    print(' Epoch: %d  , Loss: %.4f,  Accuracy: %.2f'%(epoch, meter_loss.value()[0], acc.value()[0]))
    print("Test set accuracy is")
    print(accuracy(model,tstD,tstLb))
    return acc.value()[0]

###Our 1st LSTM Model: Simple LSTM with 1 hidden and classification layer.

In [0]:
class Model1(nn.Module):
    def __init__(self):
        super(Model1, self).__init__()
        
        self.recurrent_layer1 = nn.LSTM(4096,500,num_layers=1,dropout=0.4,batch_first=True)
        self.classify_layer1 = nn.Linear(500,25)
        
    
    def forward(self, input, h_t_1=None, c_t_1=None):
        
        rnn_outputs, (hn, cn) = self.recurrent_layer1(input)
        logits = self.classify_layer1(rnn_outputs[:,-1])
        
        return logits
        
model1 = Model1()
print(model1)

Model1(
  (recurrent_layer1): LSTM(4096, 500, batch_first=True, dropout=0.4)
  (classify_layer1): Linear(in_features=500, out_features=25, bias=True)
)


In [0]:
optimizer = torch.optim.Adam(model1.parameters(),lr=1e-3)
criterion = nn.CrossEntropyLoss()
num_epochs = 20
for e in range(num_epochs):
    run_epoch(trD_batches, trLb_batches, model1, criterion, e,tstD_batches,tstLb_batches,optimizer)
     

 Epoch: 0  , Loss: 1.8165,  Accuracy: 51.58
Test set accuracy is
Got 567 / 948 correct (59.81)
None
 Epoch: 1  , Loss: 0.8908,  Accuracy: 77.20
Test set accuracy is
Got 648 / 948 correct (68.35)
None
 Epoch: 2  , Loss: 0.5929,  Accuracy: 84.72
Test set accuracy is
Got 666 / 948 correct (70.25)
None
 Epoch: 3  , Loss: 0.4745,  Accuracy: 88.25
Test set accuracy is
Got 680 / 948 correct (71.73)
None
 Epoch: 4  , Loss: 0.3545,  Accuracy: 91.61
Test set accuracy is
Got 703 / 948 correct (74.16)
None
 Epoch: 5  , Loss: 0.3182,  Accuracy: 92.32
Test set accuracy is
Got 717 / 948 correct (75.63)
None
 Epoch: 6  , Loss: 0.2796,  Accuracy: 93.27
Test set accuracy is
Got 707 / 948 correct (74.58)
None
 Epoch: 7  , Loss: 0.2428,  Accuracy: 94.02
Test set accuracy is
Got 709 / 948 correct (74.79)
None
 Epoch: 8  , Loss: 0.2254,  Accuracy: 95.06
Test set accuracy is
Got 697 / 948 correct (73.52)
None
 Epoch: 9  , Loss: 0.2455,  Accuracy: 93.73
Test set accuracy is
Got 726 / 948 correct (76.58)
None


###Model 2: LSTM with added dropout after recurrent_layer

In [0]:
class Model2(nn.Module):
    def __init__(self):
        super(Model2, self).__init__()
        
        self.recurrent_layer1 = nn.LSTM(4096,500,num_layers=1,dropout=0.3,batch_first=True)
        self.d=nn.Dropout(0.4)
        self.classify_layer1 = nn.Linear(500,25)
        
    
    def forward(self, input, h_t_1=None, c_t_1=None):
        
        rnn_outputs, (hn, cn) = self.recurrent_layer1(input)
        rnn_outputs=self.d(rnn_outputs)
        logits = self.classify_layer1(rnn_outputs[:,-1])
        
        return logits

model2 = Model2()
print(model2)

Model2(
  (recurrent_layer1): LSTM(4096, 500, batch_first=True, dropout=0.3)
  (d): Dropout(p=0.4, inplace=False)
  (classify_layer1): Linear(in_features=500, out_features=25, bias=True)
)


In [0]:
optimizer = torch.optim.Adam(model2.parameters(),lr=1e-3)
criterion = nn.CrossEntropyLoss()
num_epochs = 20
for e in range(num_epochs):
    run_epoch(trD_batches, trLb_batches, model2, criterion, e,tstD_batches,tstLb_batches,optimizer)
     

 Epoch: 0  , Loss: 2.3688,  Accuracy: 30.44
Test set accuracy is
Got 431 / 948 correct (45.46)
None
 Epoch: 1  , Loss: 1.5582,  Accuracy: 52.99
Test set accuracy is
Got 550 / 948 correct (58.02)
None
 Epoch: 2  , Loss: 1.2047,  Accuracy: 64.04
Test set accuracy is
Got 578 / 948 correct (60.97)
None
 Epoch: 3  , Loss: 0.9876,  Accuracy: 69.56
Test set accuracy is
Got 661 / 948 correct (69.73)
None
 Epoch: 4  , Loss: 0.8870,  Accuracy: 72.13
Test set accuracy is
Got 652 / 948 correct (68.78)
None
 Epoch: 5  , Loss: 0.7920,  Accuracy: 74.00
Test set accuracy is
Got 692 / 948 correct (73.00)
None
 Epoch: 6  , Loss: 0.7441,  Accuracy: 75.62
Test set accuracy is
Got 668 / 948 correct (70.46)
None
 Epoch: 7  , Loss: 0.7395,  Accuracy: 76.00
Test set accuracy is
Got 705 / 948 correct (74.37)
None
 Epoch: 8  , Loss: 0.6321,  Accuracy: 79.36
Test set accuracy is
Got 699 / 948 correct (73.73)
None
 Epoch: 9  , Loss: 0.6104,  Accuracy: 79.57
Test set accuracy is
Got 675 / 948 correct (71.20)
None


###Model 3: LSTM with ReLu activation on recurrent layer.

In [0]:
class Model3(nn.Module):
    def __init__(self):
        super(Model3, self).__init__()
        
        self.recurrent_layer1 = nn.LSTM(4096,500,num_layers=1,dropout=0.4,batch_first=True)
        self.act=nn.ReLU(inplace=True)
        self.classify_layer1 = nn.Linear(500,25)
        
    
    def forward(self, input, h_t_1=None, c_t_1=None):
        
        rnn_outputs, (hn, cn) = self.recurrent_layer1(input)
        rnn_outputs=self.act(rnn_outputs)
        logits = self.classify_layer1(rnn_outputs[:,-1])
        
        return logits

model3 = Model3()
print(model3)
optimizer = torch.optim.Adam(model3.parameters(),lr=1e-4)
criterion = nn.CrossEntropyLoss()
num_epochs = 20
for e in range(num_epochs):
    run_epoch(trD_batches, trLb_batches, model3, criterion, e,tstD_batches,tstLb_batches,optimizer)
     

Model3(
  (recurrent_layer1): LSTM(4096, 500, batch_first=True, dropout=0.4)
  (act): ReLU(inplace=True)
  (classify_layer1): Linear(in_features=500, out_features=25, bias=True)
)
 Epoch: 0  , Loss: 1.7229,  Accuracy: 69.19
Test set accuracy is
Got 757 / 948 correct (79.85)
None
 Epoch: 1  , Loss: 0.6340,  Accuracy: 93.40
Test set accuracy is
Got 776 / 948 correct (81.86)
None
 Epoch: 2  , Loss: 0.2923,  Accuracy: 97.88
Test set accuracy is
Got 787 / 948 correct (83.02)
None
 Epoch: 3  , Loss: 0.1482,  Accuracy: 99.29
Test set accuracy is
Got 795 / 948 correct (83.86)
None
 Epoch: 4  , Loss: 0.0807,  Accuracy: 99.83
Test set accuracy is
Got 802 / 948 correct (84.60)
None
 Epoch: 5  , Loss: 0.0466,  Accuracy: 100.00
Test set accuracy is
Got 802 / 948 correct (84.60)
None
 Epoch: 6  , Loss: 0.0276,  Accuracy: 100.00
Test set accuracy is
Got 803 / 948 correct (84.70)
None
 Epoch: 7  , Loss: 0.0172,  Accuracy: 100.00
Test set accuracy is
Got 809 / 948 correct (85.34)
None
 Epoch: 8  , Loss

##Model 4: LSTM with BatchNorm,ReLu Activation and dropout on classification layer

In [0]:
class Flatten(nn.Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

class Model4(nn.Module):
    def __init__(self):
        super(Model4, self).__init__()
        self.recurrent_layer = torch.nn.LSTM(4096,100, num_layers = 1, dropout=0.5, batch_first=True)
        self.classify_layer = torch.nn.Sequential(
            torch.nn.Conv1d(25,4,3),
            torch.nn.BatchNorm1d(4),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.5),
            Flatten(),
            torch.nn.Linear(392,25))
    
    def forward(self, input, h_t_1=None, c_t_1=None):
        rnn_outputs, (hn, cn) = self.recurrent_layer(input)
        logits = self.classify_layer(rnn_outputs)
        return logits
      
model4 = Model4()
print(model4)

Model4(
  (recurrent_layer): LSTM(4096, 100, batch_first=True, dropout=0.5)
  (classify_layer): Sequential(
    (0): Conv1d(25, 4, kernel_size=(3,), stride=(1,))
    (1): BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout(p=0.5, inplace=False)
    (4): Flatten()
    (5): Linear(in_features=392, out_features=25, bias=True)
  )
)


In [0]:
optimizer = torch.optim.Adam(model4.parameters(),lr=1e-3)
criterion = nn.CrossEntropyLoss()
num_epochs = 20
for e in range(num_epochs):
    run_epoch(trD_batches, trLb_batches, model4, criterion, e,tstD_batches,tstLb_batches,optimizer)
     

Model3(
  (recurrent_layer): LSTM(4096, 100, batch_first=True, dropout=0.5)
  (classify_layer): Sequential(
    (0): Conv1d(25, 4, kernel_size=(3,), stride=(1,))
    (1): BatchNorm1d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout(p=0.5, inplace=False)
    (4): Flatten()
    (5): Linear(in_features=392, out_features=25, bias=True)
  )
)
 Epoch: 0  , Loss: 2.0573,  Accuracy: 37.00
Test set accuracy is
Got 520 / 948 correct (54.85)
None
 Epoch: 1  , Loss: 0.9883,  Accuracy: 67.52
Test set accuracy is
Got 683 / 948 correct (72.05)
None
 Epoch: 2  , Loss: 0.6441,  Accuracy: 78.65
Test set accuracy is
Got 689 / 948 correct (72.68)
None
 Epoch: 3  , Loss: 0.4468,  Accuracy: 84.55
Test set accuracy is
Got 690 / 948 correct (72.78)
None
 Epoch: 4  , Loss: 0.4211,  Accuracy: 85.96
Test set accuracy is
Got 714 / 948 correct (75.32)
None
 Epoch: 5  , Loss: 0.3475,  Accuracy: 88.79
Test set accuracy is
Got 689 / 948 correct (72.68)
None
 Epoch: 6

**RESULT: Model 3 (LSTM) trained on 25 class images performed the best with test set accuracy of 85.86%.**

---
---
## **Problem 3.** Evaluation

###SVM Classifier:

In [0]:
## Flattening each 4096*25 sized image 
def get_flat_data(dataset,r=4096,c=25):
  temp=np.zeros((0,r*c))
  for each in dataset:
    temp=np.vstack((temp,np.reshape(each,(1,r*c))))
  return temp


In [0]:
training_data=get_flat_data(trD)
testing_data=get_flat_data(tstD)
  
print(training_data.shape)
print(testing_data.shape)
print(trLb)
print(tstLb)

(2409, 102400)
(951, 102400)
('11', '23', '15', '22', '3', '22', '1', '15', '4', '15', '18', '16', '9', '9', '8', '24', '16', '16', '12', '12', '25', '1', '13', '5', '9', '4', '20', '24', '12', '13', '10', '1', '5', '18', '23', '22', '16', '9', '23', '11', '22', '10', '17', '3', '6', '23', '10', '19', '3', '21', '5', '20', '21', '25', '1', '17', '16', '8', '23', '21', '6', '2', '13', '13', '22', '12', '9', '3', '22', '7', '6', '9', '20', '22', '3', '16', '2', '6', '2', '9', '1', '2', '4', '1', '6', '3', '20', '14', '5', '9', '10', '18', '3', '3', '3', '1', '16', '15', '16', '1', '17', '9', '24', '7', '3', '18', '15', '24', '4', '16', '8', '24', '9', '17', '18', '23', '15', '8', '5', '1', '21', '4', '9', '11', '6', '22', '23', '23', '12', '9', '15', '17', '5', '11', '14', '13', '23', '12', '19', '3', '16', '11', '1', '18', '13', '12', '19', '18', '11', '8', '2', '9', '24', '24', '11', '14', '1', '20', '19', '4', '2', '4', '20', '9', '6', '8', '2', '6', '11', '20', '17', '10', '24', '17'

**SVM TRAINING**

In [0]:
from sklearn.svm import SVC  
svm = SVC(kernel='linear', C = 1.0)  
svm.fit(training_data, np.array(trLb,dtype=int))

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='linear', max_iter=-1, probability=False, random_state=None,
    shrinking=True, tol=0.001, verbose=False)

**Making predictions using Trained SVM and Evaluation.**

In [0]:
pred_label=svm.predict(testing_data)
from sklearn.metrics import accuracy_score  
acc=accuracy_score(np.array(tstLb,dtype=int), pred_label)
print(acc*100,"%")

82.75499474237644 %


**SVM Accuracy on test data set of 25 classes : 82.75499474237644 %**

## **Bonus**


###Separating train and test data for entire UCF101 dataset.

In [0]:
################################
# using mapping file to separate train and test data
# using pickle to save separated train and test data
###############################
mapping = pd.read_csv('annos/videos_labels_subsets.txt',sep="\t", header=None)
mapping.columns = ["video_name", "class_label", "set"]
print(mapping.head())
data = mapping[mapping.class_label <= 101]
train = data[data.set==1]
test = data[data.set==2]
print(mapping.shape, data.shape, train.shape, test.shape)
xtrain=[]
ytrain=[]
xtest=[]
ytest=[]

for index, row in train.iterrows():
    vid_name = row['video_name']
    # print(vid_name)
    mat_data = loadmat('bonus_features/'+vid_name+'.mat')['Feature']
    xtrain.append(mat_data)
    ytrain.append(row['class_label'])

for index, row in test.iterrows():
    vid_name = row['video_name']
    mat_data = loadmat('bonus_features/'+vid_name+'.mat')['Feature']
    xtest.append(mat_data)
    ytest.append(row['class_label'])
print(len(xtrain), len(ytrain), len(xtest), len(ytest))

pickle.dump(np.stack(xtrain, axis=0), open('xtrain_bonus.p','wb'))
pickle.dump(np.stack(ytrain, axis=0), open('ytrain_bonus.p','wb'))
pickle.dump(np.stack(xtest, axis=0), open('xtest_bonus.p','wb'))
pickle.dump(np.stack(ytest, axis=0), open('ytest_bonus.p','wb'))

  video_name  class_label  set
0   v_000001            1    2
1   v_000002            1    2
2   v_000003            1    2
3   v_000004            1    2
4   v_000005            1    2
(13320, 3) (13320, 3) (9537, 3) (3783, 3)
9537 9537 3783 3783


In [0]:
##Loading train and test data/labels using Pickle.
xtrain = pickle.load(open('xtrain_bonus.p', 'rb'))
xtest = pickle.load(open('xtest_bonus.p','rb'))
ytrain = pickle.load(open('ytrain_bonus.p', 'rb'))
ytest = pickle.load(open('ytest_bonus.p','rb'))

In [0]:
##Checking size of UCF101 dataset:
import os
vid=(os.listdir("bonus_features/"))
print(len(vid))


13320


In [0]:
#shuffle train and test
bundle_train = list(zip(xtrain, ytrain))
random.shuffle(bundle_train)
trData, trLb = zip(*bundle_train)

bundle_test = list(zip(xtest, ytest))
random.shuffle(bundle_test)
testData, testLb = zip(*bundle_test)

In [0]:
trainD_batches,trainLb_batches=get_batches(trData, trLb)
testD_batches,testLb_batches=get_batches(testData, testLb)

1192
torch.Size([8, 25, 4096])
1192
472
torch.Size([8, 25, 4096])
472


In [0]:
import torchnet as tnt
num_class=101
def run_epoch2(dataset, labels , model, criterion, epoch, tstD,tstLb, optimizer=None):
    
    confusion_matrix = tnt.meter.ConfusionMeter(num_class)
    acc = tnt.meter.ClassErrorMeter(accuracy=True)
    meter_loss = tnt.meter.AverageValueMeter()
    model.train()
    for i in range(len(dataset)):
        sequence = dataset[i]
        label = labels[i]
        input_sequence_var = Variable(sequence.float())
        input_label_var = Variable(label)

        # compute output
        output_logits = model(input_sequence_var)
        loss = criterion(output_logits, input_label_var)

        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        meter_loss.add(loss.data)
        acc.add(output_logits.data, input_label_var.data)
        confusion_matrix.add(output_logits.data, input_label_var.data)


    print(' Epoch: %d  , Loss: %.4f,  Accuracy: %.2f'%(epoch, meter_loss.value()[0], acc.value()[0]))
    print("Test set accuracy is")
    print(accuracy(model,tstD,tstLb))
    return acc.value()[0]

In [0]:
class BestModel(nn.Module):
    def __init__(self):
        super(BestModel, self).__init__()
        
        self.recurrent_layer1 = nn.LSTM(4096,500,num_layers=1,dropout=0.4,batch_first=True)
        self.act=nn.ReLU(inplace=True)
        self.classify_layer1 = nn.Linear(500,101)
        
    
    def forward(self, input, h_t_1=None, c_t_1=None):
        
        rnn_outputs, (hn, cn) = self.recurrent_layer1(input)
        rnn_outputs=self.act(rnn_outputs)
        logits = self.classify_layer1(rnn_outputs[:,-1])
        
        return logits

model = BestModel()
print(model)
optimizer = torch.optim.Adam(model.parameters(),lr=1e-4)
criterion = nn.CrossEntropyLoss()
num_epochs = 20
for e in range(num_epochs):
    run_epoch2(trainD_batches, trainLb_batches, model, criterion, e,testD_batches,testLb_batches,optimizer)
     

BestModel(
  (recurrent_layer1): LSTM(4096, 500, batch_first=True, dropout=0.4)
  (act): ReLU(inplace=True)
  (classify_layer1): Linear(in_features=500, out_features=101, bias=True)
)
 Epoch: 0  , Loss: 3.0723,  Accuracy: 44.97
Test set accuracy is
Got 2164 / 3780 correct (57.25)
None
 Epoch: 1  , Loss: 1.5666,  Accuracy: 76.15
Test set accuracy is
Got 2568 / 3780 correct (67.94)
None
 Epoch: 2  , Loss: 0.9286,  Accuracy: 87.21
Test set accuracy is
Got 2686 / 3780 correct (71.06)
None
 Epoch: 3  , Loss: 0.5776,  Accuracy: 93.20
Test set accuracy is
Got 2759 / 3780 correct (72.99)
None
 Epoch: 4  , Loss: 0.3666,  Accuracy: 96.61
Test set accuracy is
Got 2785 / 3780 correct (73.68)
None
 Epoch: 5  , Loss: 0.2335,  Accuracy: 98.31
Test set accuracy is
Got 2779 / 3780 correct (73.52)
None
 Epoch: 6  , Loss: 0.1500,  Accuracy: 99.37
Test set accuracy is
Got 2801 / 3780 correct (74.10)
None
 Epoch: 7  , Loss: 0.0913,  Accuracy: 99.80
Test set accuracy is
Got 2806 / 3780 correct (74.23)
None


 **RESULT FOR FULL UCF101 DATASET:** LSTM Model accuracy on test dataset for 101 class images is 76.67%

## **Problem 4.** Report

**Report can be found in the uploaded .zip file to Blackboard.**



## Submission
---
**Runnable source code in ipynb file and a pdf report are required**.

The report should be of 3 to 4 pages describing what you have done and learned in this homework and report performance of your model. If you have tried multiple methods, please compare your results. If you are using any external code, please cite it in your report. Note that this homework is designed to help you explore and get familiar with the techniques. The final grading will be largely based on your prediction accuracy and the different methods you tried (different architectures and parameters).

Please indicate clearly in your report what model you have tried, what techniques you applied to improve the performance and report their accuracies. The report should be concise and include the highlights of your efforts.
The naming convention for report is **Surname_Givenname_SBUID_report*.pdf**

When submitting your .zip file through blackboard, please
-- name your .zip file as **Surname_Givenname_SBUID_hw*.zip**.

This zip file should include:
```
Surname_Givenname_SBUID_hw*
        |---Surname_Givenname_SBUID_hw*.ipynb
        |---Surname_Givenname_SBUID_hw*.pdf
        |---Surname_Givenname_SBUID_report*.pdf
```

For instance, student Michael Jordan should submit a zip file named "Jordan_Michael_111134567_hw5.zip" for homework5 in this structure:
```
Jordan_Michael_111134567_hw5
        |---Jordan_Michael_111134567_hw5.ipynb
        |---Jordan_Michael_111134567_hw5.pdf
        |---Jordan_Michael_111134567_report*.pdf
```

The **Surname_Givenname_SBUID_hw*.pdf** should include a **google shared link**. To generate the **google shared link**, first create a folder named **Surname_Givenname_SBUID_hw*** in your Google Drive with your Stony Brook account. 

Then right click this folder, click ***Get shareable link***, in the People textfield, enter two TA's emails: ***bo.cao.1@stonybrook.edu*** and ***sayontan.ghosh@stonybrook.edu***. Make sure that TAs who have the link **can edit**, ***not just*** **can view**, and also **uncheck** the **Notify people** box.

Colab has a good feature of version control, you should take advantage of this to save your work properly. However, the timestamp of the submission made in blackboard is the only one that we consider for grading. To be more specific, we will only grade the version of your code right before the timestamp of the submission made in blackboard. 

You are encouraged to post and answer questions on Piazza. Based on the amount of email that we have received in past years, there might be dealys in replying to personal emails. Please ask questions on Piazza and send emails only for personal issues.

Be aware that your code will undergo plagiarism check both vertically and horizontally. Please do your own work.