---

#University of Stirling - Spring 2023

## CSCU9M6 - Natural Language Processing and Computer Vision (2022/3)

---

# Assignment Summary

In this activity, you are required to apply the knowledge acquired in this module through the design and development of a complete project for image classification in an application to be defined by yourself. For this, you will need to perform the following **mandatory** steps:

1. [Problem definition](#scrollTo=hglJVRRslqMn)
2. [GitHub repository](#scrollTo=ecxDhkV9qmUf)
3. [Dataset](#scrollTo=qEgFzxmWrGA9)
4. [Dataloader](#scrollTo=EDd6lLwlx4un)
5. [Proposed solution](#scrollTo=ScTrpUW8zOp4)
6. [Experimental tests and evaluations](#scrollTo=3RBW58of0ZDo)
7. [Quiz and Report](#scrollTo=ws14iV4Dp_vf)

**Deadlines** and other details can be seen on Canvas [\[link\]](https://canvas.stir.ac.uk/courses/12587/assignments/102373).

---

# 1. **Problem definition** 


In this assignment, you are required to apply the knowledge acquired in the module to solve a classification problem from images collected in the context of two different cities (A and B).
 - If the work is being carried out in pairs, **cities A and B must be the hometowns of each student**. In the case of individual work, city A must be your hometown and city B must be Stirling (or Edinburgh, if needed).
 - The standard recommendation is that the project focuses on classifying cars or trees image scenes, which are easier to identify and annotate. Other objects or phenomena can be adopted, but are subject to prior approval by the module instructor (Jefersson A. dos Santos). **You are not allowed to assemble datasets containing people. Other sensitive patterns, such as license plates, must be properly hidden.**
 - Don't panic! We are aware that acquiring images _in situ_ is an impediment for most students. The dataset can be assembled with images collected remotely or from public repositories. Just be careful with rights and permissions for using images found on the internet. Anyway, these factors must be taken into account for the problem definition.
 - While we encourage you to do interesting and engaging work, it shouldn't be too complex or time-consuming. Try to appropriately scale the time required for this step. Ask the instructors for advice, if necessary. **GA students:** you are encouraged to link the project with your work activities, but keep in mind you still need to construct two datasets (A and B). 

[top](#scrollTo=4i5afvUbhmGo)
 

---
# 2. **GitHub repository**

Give your project a name, create a private [GitHub repository](https://github.com/) with the name [Module Code] + [Project Name] and give access to the module instructors. Create a cover page with a description of your project. This empty notebook must be uploaded in the repository as well as the created dataset. The deadline to perform this task will be 10 days after the publication of this notebook. 
This notebook should be updated and committed to the repository according to the deadlines.
The repository's update history will be used as a criterion for monitoring and evaluating the work.
**Check the videos provided in the extra section on Canvas for more details on how to create your GitHub repository** [\[link\]](https://canvas.stir.ac.uk/courses/12587/pages/extra-session-cnn-hyperparameters-and-github).

[top](#scrollTo=4i5afvUbhmGo)

---
# 3. **Dataset creation**

You must collect a minimum of **200 positive samples** from the study objects for each city (A and B). 
Note that, depending on the task being solved, it will also be necessary to collect more samples - negative ones, for instance.

Your dataset can be assembled from one or more of the following ways:

  - *M1* - Pictures taken by yourself on site (street view from cities A and B), with attention to anonymization issues (if it is the case). It is not allowed to assemble datasets containing people. Other sensitive patterns, such as license plates, must be properly hidden.

  - *M2* - Aerial satellite/drone images obtained from GIS and remote sensing platforms or public repositories. Be careful with unusual file formats that may be challenging to manipulate using basic image processing libraries. We recommend keeping or converting the images to jpg or png.

  - *M3* - Pictures taken from other public available datasets. Remember you are not allowed to use datasets containing people or other sensitive patterns/objects.

  - *M4* - Images crawled from the internet as a whole (social networks, webpages, etc), with special attention to use and copyrights.

  - *M5* - Textual and metadata you may need in your project, with special attention to use and copyrights (as always!).

**Important:** If you collect the images on your own or from aerial imagery repositories, it will be necessary to keep the geographic coordinates. If you collect from specific websites, please retain the source links. This information should be placed in a .csv file and made available along with the final dataset.

[top](#scrollTo=4i5afvUbhmGo)

---

# 4. **Dataloader**

Here you are required to implement all the code related to pre-processing, cleaning, de-noising and preparing the input images and metadata according to the necessary data structures as input to your pattern recognition module. We recommend using [PyTorch](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) or [Tensorflow (with Keras)](https://keras.io/getting_started/intro_to_keras_for_engineers/) as a base, but you are free to use any library or platform as long as it is well justified in the [final report](#scrollTo=ws14iV4Dp_vf).

[top](scrollTo=4i5afvUbhmGo)

In [1]:
# library imports
import time, os, sys, numpy as np
import torch
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch._utils_internal

import PIL
from PIL import Image
from torch import optim
from torchsummary import summary

# test if GPU is avaliable
# otherwise use CPU 
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
n = torch.cuda.device_count()
devices_ids= list(range(n))

In [2]:
# compute accuracy

# data_iter: iterator, provides data batches from given data set
# net: neural network model to evaluate
# loss: function, evaluates model's predictions

def evaluate_accuracy(data_iter, net, loss):
    """Evaluate model accuracy on a given data set."""
    
    # acc_sum: single element tensor, stores sum of batches' accuracies
    # n: total no. of data set samples
    # l: loss sum from all batches
    acc_sum, n, l = torch.Tensor([0]), 0, 0
    net.eval()
    with torch.no_grad():
        # iterate over each batch consisted of
        # x: input data batch
        # y: output label batch
        for x, y in data_iter:
            x, y = x.to(device), y.to(device)
            y_hat = net(x)
            l += loss(y_hat.argmax(axis=1) == y).sum().item()
            n += y.size()[0]
        
        # return model accuracy and average loss per batch
        return acc_sum.item() / n, l.item() / len(data_iter)

    
# train and validate neural network model 

# net: neural network model to train and evaluate
# train_iter: iterator, provides data batches of training data
# test_iter: iterator, provides data batches of test data
# batch_size: no. of samples per batch
# trainer: optimizer, updates model parameters
# loss: function, evaluates model's predictions
# num_epochs: no. of times model will be trained on the entire training set

def train_validate(net, train_iter, test_iter, batch_size, trainer, 
                   loss, num_epochs):
    print('Training on', device)
    # iterate through each epoch in range 0 to num_epochs-1
    # epoch: complete iteration over the entire training set
    for epoch in range(num_epochs):
        net.train()
        # initialize vars used to track training progress
        # train_l_sum: sum of losses of all batches in current epoch
        # train_acc_sum: sum of accuracies of all batches in current epoch
        # n: total no. of samples in current epoch
        # start: current time in seconds
        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()
        # iterate over each training data batch in train_iter, consists of
        # x: input data batch
        # y: output label batch
        for x, y in train_iter:
            x, y = x.to(device), y.to(device)
            y_hat = net(x)
            trainer.zero_grad()
            l = loss(y_hat, y).sum()
            l.backward()
            trainer.step()
            # update progress tracking vars
            # train_l_sum: updated with current batch loss
            # train_acc_sum: updated with current batch accuracy
            # n: updated with current batch size
            train_l_sum += (y_hat.argmax(axis = 1) == y).sum().item()
            n += y.size()[0]
        test_acc, test_loss = evaluate+accuracy(test_iter, net, loss)
        # print sumarized training process for current epoch including
        # epoch no., average training loss, training accuracy, test loss,
        # test accuracy, time taken to complete epoch
        print('epoch %d, train loss %.4f, train acc %.3f, test loss %.4f, '
              'test acc %.3f, time %.1f sec' 
              % (epoch + 1, train_l_sum / len(train_iter), train_acc_sum / n, 
               test_loss, test_acc, time.time() - start))
        # function returns nothing since all progress
        # is printed to console

### Labels correspondence

|**#**   |**City**   |**Species**   |**Label**   |
|:-:|:-:|:--:|:--:|
|1   |A & B|no tree in image   | not  | 
|2  |A   |araucaria               |ara   |
|3 |A   |banana tree                |ban   |
|4|A   |chestnut                 |che   |
|5   |A   |palm tree        |pal   |
|6   |A   |plane tree           |pla   |
|7   | A  |australian rubber tre   |art  |
|8   |A   |cedar                   |ced   |
|9   | B  |ash tree           |ash   |
|10   | B  |sycamore    |     syc  |  
|11   | B |oak   |oak   |
|12   | B  |apple tree   |app   |
|13   | B  |Scots pine   |pin  |
|14   | B  |beech   |bee   |  
|15   |  B |bird cherry tree   |bct   |

In [3]:
# implement TreesDataset class

class TreesDataset(torch.utils.data.Dataset):

    def __init__(self, root, transform, train=False, calc_norm=False, has_norm=True):
        self.root = root
        self.train = train
        self.calc_norm = calc_norm
        self.has_norm = has_norm
        self.le = {'not': 0, 'ara': 1, 'ban': 2, 'che': 3, 'pal': 4, 
                   'pla': 5, 'art': 6, 'ced': 7, 'ash': 8, 'syc': 9, 
                  'oak': 10, 'app': 11, 'pin': 12, 'bee': 13, 'bct': 14}
        self.transform = transform
        self.load_images()
        
    def load_images(self):
        self.img_list, self.labels = self.read_images(root = self.root)
        
    def read_images(self, root):
        img_list, labels = [], []
        if self.train is True:
            # train data
            data_file = open(os.path.join('dtd', 'labels', 'train1.txt'), "r")  
            data_list = [i.replace('\n', '') for i in data_file.readlines()]
            for img_path in data_list:
                img_list.append(os.path.join(root, 'images', img_path))
                labels.append(self.le[img_path.split('/')[0]])
            
            # validate data
            data_file = open(os.path.join(root, 'labels', 'val1.txt'), "r")  
            data_list = [i.replace('\n', '') for i in data_file.readlines()]
            for img_path in data_list:
                img_list.append(os.path.join(root, 'images', img_path))
                labels.append(self.le[img_path.split('/')[0]])
                
        else:
            # test data
            data_file = open(os.path.join(root, 'labels', 'test1.txt'), "r")
            data_list = [i.replace('\n', '') for i in data_file.readlines()]
            for img_path in data_list:
                img_list.append(os.path.join(root, 'images', img_path))
                labels.append(self.le[img_path.split('/')[0]])
                                
        return img_list, labels
                
    def __getitem__(self, item):
        if self.has_norm is True:
            # normalize the image if has_norm is set to True
            cur_img = self.normalize_image(self.transform(Image.open(self.img_list[item])))
        else:
            # just convert the image to tensor, without normalizing
            cur_img = self.transform(Image.open(self.img_list[item]))
        cur_label = self.labels[item]
        return cur_img, cur_label
        
    def __len__(self):
        return len(self.img_list)
    
    # image normalization
    # calc_norm True: normalize by subtracting the mean divided by the
    #                 deviation for each image channel
    # calc_norm False: normalize by predifined mean and 
    #                  standard deviation values
    
    def normalize_image(self, img):
        if self.calc_norm is True:
            for i in range(img.shape[0]):
                mu = img[i, :, :].mean()
                std = img[i, :, :].std()
                img[i, :, :] = ((img[i, :, :] - mu) / std)
        else:
            img = torchvision.transforms.functional.normalize(img,
                                                mean=torch.Tensor([0.485, 0.456, 0.406]),
                                                std=torch.Tensor([0.229, 0.224, 0.225]))
        return img
    
    # sequence of image transformations defined by transformer
def load_data(dataset, root, batch_size, resize=None):
        
    transformer = []
    
    if resize is not None:
        transformer += [torchvision.transforms.Resize(size=(resize,resize))]
        
    else:
        transformer += [torchvision.transforms.RandomCrop(224, padding=4)]
    transformer += [torchvision.transforms.ToTensor()]
    transformer = torchvision.transforms.Compose(transformer)
        
    # get training data
    train = dataset(root = root,
                    transform = transformer,
                    train = True)
    # get validation data
    test = dataset(root = root,
                    transform = transformer,
                    train = False)
        
    num_workers = 0 if sys.platform.startswith('win32') else 4
        
    # create training dataloader
    train_iter = torch.utils.data.DataLoader(train,
                                batch_size, shuffle=True,
                                num_workers=num_workers)
        
    # create testing dataloader
    test_iter = torch.utils.data.DataLoader(test,
                                batch_size, shuffle=False,
                                num_workers=num_workers)
        
    return train_iter, test_iter

# load data
batch_size = 32
train_iter = load_data(TreesDataset, 
                      os.path.join('dtd'), 
                      batch_size, 
                      resize=None)

test_iter = = load_data(TreesDataset, 
                      os.path.join('dtd'), 
                      batch_size, 
                      resize=None)

KeyError: 'images'

In [4]:
NUM_CLASSES = 15
num_epochs = 20
lr = 0.001

# use pre-trained Alexnet from torchvision
model = torchvision.models.resnet34(pretrained=True)

# change output of last classification layer to no. of classes
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,NUM_CLASSES)
model = model.to(device)

# define loss
loss = nn.CrossEntropyLoss().to(device)

# define optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=lr)

# train and validade
train_validate(model, train_iter, test_iter, batch_size, optimizer, loss, num_epochs)



NameError: name 'train_iter' is not defined

---

# 5. **Proposed solution** 

This is where you should implement most of your code for your solution. Write the routines for training and predicting the models and any necessary intermediate steps. Post-processing functions must also be implemented here.

  - Use good programming practices, modularizing and adequately commenting on your code. Code quality will be considered in the final assessment.

  - You can use pre-trained models as backbones or any code available on the web as a basis, but they must be correctly credited and referenced both in this notebook and in the final report. Cite the source link repository and explicitly cite the authors of it.
If you changed existing code, make it clear what the changes were.
Make it clear where your own code starts and where it ends. Note that the originality percentage of the code will be considered in the evaluation, so use external codes wisely and sparingly. **Missconduct alert:** remember that there are many tools that compare existing source code and that it is relatively easy to identify authorship. So, be careful and fair by always properly thanking the authors if you use external code.

[top](#scrollTo=4i5afvUbhmGo)

---

# 6. **Experimental tests and evaluations** 


Here you must implement your code for training, testing and evaluating your solution. For this, the following code blocks (*E1*, *E2*, and *E3*) are mandatory:

  - *E1* - Training the models. Implement code to call the dataloaders implemented for training your models.  Make routines to test different parameters of your models. Plot graphs that illustrate how parameters impact model training. Compare. Train and select a model for each city (A and B) and justify. You should use half (50%) of the samples from each dataset for training and leave the other half for testing (50%). 

[top](#scrollTo=4i5afvUbhmGo)

In [None]:
# Write your codes for E1 here. Create more code cells if needed





  - *E2* - Testing the models in the dataset. You must implement code routines to test the predictive ability of your models using half of each dataset intended for testing. **The model trained in city A must be tested in city A. The model trained in city B must be tested in city B.** Use the evaluation metrics (accuracy, F1-score, AUC, etc) that are most appropriate for your problem. Plot graphs that illustrate the results obtained for each city (A and B). Plot visual examples of correctly (true positive) and incorrectly (false positive) classified samples. 

[top](#scrollTo=4i5afvUbhmGo)


In [None]:
# Write your codes for E2 here. Create more code cells if needed





  - *E3* - Testing the models crossing datasets. Here you must do exactly the same as in *E2*, but now training in one city and testing in the other. **The model trained in city A must be tested in city B. The model trained in city B must be tested in city A.** Use the same metrics and plot the same types of graphs so that results are comparable.

[top](scrollTo=4i5afvUbhmGo)

In [None]:
# Write your codes for E3 here. Create more code cells if needed





---

# 7. **Quiz and Report**

Answer the assessment quiz that will be made available on Canvas one week before the final deadline. Make a 2-page report using the [IEEE template](https://www.overleaf.com/read/rdqwshtvyjdn) with a maximum of 1000 words. Latex is recommended, but you can deliver the report in MS Word if you prefer. Your report should contain five sections: introduction, description of the proposed solution with justifications, results (here you can include the same graphs and pictures generated in this jupyter notebook), discussion of the results, and conclusion. Properly cite references to articles, tutorials, and sources used. A pdf version of your report should be made available in the project's github repository under the name "[project name] + _final_report.pdf".


[top](#scrollTo=4i5afvUbhmGo)