# UT Austin CS 370 Undergraduate Research Project
### Researcher: Tejas Saboo <br> Supervisor: Professor Angela Beasley

Emerging technologies are disrupting the art industry and changing the way we create and experience art. As a computer science, mathematics, and statistics student with an appreciation for arts and culture, I decided to explore the intersection of art, design, and technology to identify and address opportunities to augment creativity through technology for my research project. The project entailed architecting a convolutional neural network model that classifies visual artwork by the artistic category and style.

The final machine learning model presented is a deep convolutional neural network for the multi-class classification of the artistic style of visual artwork. The resulting model successfully classifies artwork with a 96% accuracy on the training set and a 69% accuracy on the validation set.

I experimented with a variety of different technical approaches and model architectures throughout this project. First, I attempted to build a model that could distinguish between flower and marina paintings. Next, I discovered the Wikiart dataset and used the images for my model. Initially, I performed a manual train-test split by moving 80 images from each category into the training set and 20 images from each category into the test set. By limiting the size of the initial train and test sets, I was able to quickly train the models and continuously prototype the model architecture. Later, I learned how to use the shutil.move functionality to automate the process of moving a number of files from one directory to another, resulting in a scalable train-test split that also randomly shuffles the images. Regarding the model architecture, I experimented with various models, including linear and non-linear neural networks, fully convolutional neural networks, convolutional neural networks, and deep convolutional neural networks. After evaluating the different models, I found the deep convolutional neural network most effective in the classification of images of artwork.

My future plans to build on this project include experimenting with computer vision to extract visual features and elements of the artwork such as line, shape, texture, form, space, color, value, composition, perspective, and subject matter to create algorithms that suggest tags for artwork. Next, I will create an algorithm that generates original and relevant titles for the artwork using the tags and extracted visual features. I will also investigate recent artificial intelligence and machine learning advancements to create an algorithm that measures the creativity and craftsmanship expemplified in the artwork and provides actionable feedback to help artists improve. 

### Imports for ML Functionality

In [1]:
import torch
import torchvision
import matplotlib.pyplot as plt
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

In [2]:
import os
import random
import shutil

### Functions to Automate the Train-Test Dataset Split

The move_shuffled_files function randomly selects num_files images from the src directory and moves them to the dest directory. 

The train_test_split function automates the process of moving files from the original wikiart directory to the train and test directories for every relevant artistic style category under consideration in an 80:20 split.

Justification for 80:20 Split - https://scholarworks.utep.edu/cs_techrep/1209/
TODO: Cite research

In [3]:
def move_shuffled_files(src, dest, num_files):
    files = os.listdir(src)
    for file_name in random.sample(files, num_files):
        shutil.move(os.path.join(src, file_name), dest)
    

In [4]:
def train_test_split():
    wiki_dirs = ['Pointillism', 'Realism']
    base_src = 'wikiart/'
    base_train_dest = 'data/train/'
    base_test_dest = 'data/validation/'
    for wiki_dir in wiki_dirs:
        src = base_src + wiki_dir
        train_dest = base_train_dest + wiki_dir
        test_dest = base_test_dest + wiki_dir
        move_shuffled_files(src, train_dest, 400)
        move_shuffled_files(src, test_dest, 100)

In [None]:
train_test_split()

### Train and Test Transform Functions To Set Up Images for the Model

The train transform resizes the images to 256 x 256 (TODO: find older ipynb with research reference to why 256x256 or 64x64 is the ideal ratio in terms of performance versus time) and employes a variety of data augmentation techniques to reduce overfitting.

TODO: Improve explanation of why each transform functionality was selected on how they all reduce overfitting. Explain that the test transform function does not employ these transformations because the test data should not be altered, but the resizing and totensor preprocessing is necessary for use in the model evaluation calculations.


In [5]:
train_transform = transforms.Compose([
                                      transforms.Resize([256, 256]),
                                      transforms.RandomHorizontalFlip(p=0.5),
                                      transforms.ColorJitter(0.5,0.5,0.5,0.5),
                                      transforms.RandomGrayscale(0.05),
                                      transforms.ToTensor(),
                                      transforms.Normalize(mean=(0.5,0.5,0.5), std=(0.5,0.5,0.5))
])

test_transform = transforms.Compose([
                                      transforms.Resize([256, 256]),
                                      transforms.ToTensor(),
])

### Generate Train and Test Sets

The following 2 cells generate train and test sets for use in the model utilizing the above train and test transform functions. The datasets.ImageFolder method in torchvision's library facilitated the process of accessing images from desired folders while maintaining information about their class labels. 

TODO: Explain shuffling and batch sizes

In [6]:
train_data = datasets.ImageFolder(root="data/train", transform=train_transform)
train = DataLoader(train_data, batch_size = 4, shuffle = True)

In [7]:
test_data = datasets.ImageFolder(root="data/validation", transform=test_transform)
test = DataLoader(test_data, batch_size = 4, shuffle = False)

### Deep Convolutional Neural Network

TODO: Explain in more detail all of the model architectures that were designed and tested throughout this project, as well as all the variations to the DCNN model architecture that resulted in what is ultimately below. Explain methods used to avoid overfitting such as data augmentation in the train transform functions and adding dropout layers. Explain the benefits of the block structure and why it was used. Explain the various activation functions used. Summarize the overal design of the network in terms of how many blocks and how many layers, how each block is structured, and what modifications need to be made if the model is used for a multi-class classification problem for more classes than currently examined. Additionally, discuss the hyperparameter tuning process.

In [8]:
class CNNBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels, conv_kernel, conv_stride, conv_padding, pool_kernel, pool_stride):
        super().__init__()
        L = [
            torch.nn.BatchNorm2d(in_channels),
            torch.nn.ReLU(),
            torch.nn.Conv2d(in_channels, out_channels, conv_kernel, conv_stride, conv_padding, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None),
            torch.nn.BatchNorm2d(out_channels),
            torch.nn.SiLU(),
            torch.nn.MaxPool2d(pool_kernel, pool_stride),
        ]
        self.network = torch.nn.Sequential(*L)
    
    def forward(self, x):
        return self.network(x)

class CNNClassifier(torch.nn.Module):
    def __init__(self):
        super().__init__()
        in_channels = 3
        out_channels = 32
        num_classes = 2
        kernel_size = 7
        stride = 1
        padding = 3
        L = [
            CNNBlock(in_channels, out_channels, kernel_size, stride, padding, 2, 2),
            torch.nn.Dropout(p=0.2),
            CNNBlock(32, 64, 3, 1, 1, 2, 2),
            torch.nn.Dropout(p=0.25),
            CNNBlock(64, 128, 3, 1, 1, 2, 2),
            torch.nn.MaxPool2d(2),
            torch.nn.Flatten(),
            torch.nn.Linear(32768, num_classes)
        ]
        self.network = torch.nn.Sequential(*L)
        self.transforms = torchvision.transforms.Compose([torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    def forward(self, x):
        x = self.transforms(x)
        return self.network(x)

### Set Up the Model

TODO: Explain why CSE and Adam are used and cite any research publications referenced (primarily from intuition from neural networks course)

In [9]:
model = CNNClassifier()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),lr = 0.001)

### Set Up Training Functionality

The train_iter function trains the model for one epoch. The train_model function runs the train_iter function for the desired number of epochs in a loop.

TODO: Explain how each part of the training function works, and possibly add some in-line comments

In [10]:
def train_iter(epoch, net, trainDataLoader, optimizer, criterion):
    net.train()
    total = 0
    train_loss = 0
    correct = 0

    for inputs, targets in trainDataLoader:
        optimizer.zero_grad()
        outputs = net(inputs)
        _, predicted = torch.max(outputs.data, 1)
        loss = criterion(outputs,targets)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        total += targets.size(0)
        correct += (predicted == targets).sum().item()

    avg_loss = train_loss / len(trainDataLoader)
    accuracy = 100 * correct / total
    print('Epoch: {} \tTraining Loss: {:.25f} \tTraining Accuracy: {:.4f} %'.format(epoch, avg_loss, accuracy))
    return avg_loss

In [11]:
def train_model(num_epochs, net, trainDataLoader, optimizer, criterion):
  for i in range(num_epochs):
    train_iter(i, net, trainDataLoader, optimizer, criterion)

### Train the Model

Below, the model is trained for 150 epochs and interrupted by the keyboard once the loss was low and the training accuracy was high.

TODO: Implement early stopping in the dataset by returning the avg_loss or accuracy at every iteration and breaking from the train_model loop if certain thresholds are met. Then, retrain and rerun the model to avoid using keyboard interrupts with error messages.

In [12]:
train_model(150, model, train, optimizer, criterion)

Epoch: 0 	Training Loss: 5.3174726386373372122307046 	Training Accuracy: 55.6250 %
Epoch: 1 	Training Loss: 3.8052916486581671584588094 	Training Accuracy: 56.7500 %
Epoch: 2 	Training Loss: 3.0269927306758472873582377 	Training Accuracy: 55.5000 %
Epoch: 3 	Training Loss: 1.6995186791010201687868175 	Training Accuracy: 57.7500 %
Epoch: 4 	Training Loss: 1.0324230798892677363198800 	Training Accuracy: 60.8750 %
Epoch: 5 	Training Loss: 0.9121752030495554608435782 	Training Accuracy: 63.5000 %
Epoch: 6 	Training Loss: 0.7669148640334606525925665 	Training Accuracy: 67.7500 %
Epoch: 7 	Training Loss: 0.7878732044436037584844712 	Training Accuracy: 65.8750 %
Epoch: 8 	Training Loss: 0.7406022893218323632780198 	Training Accuracy: 69.1250 %
Epoch: 9 	Training Loss: 0.8545966454222798658335591 	Training Accuracy: 60.5000 %
Epoch: 10 	Training Loss: 0.7010076530091464697136416 	Training Accuracy: 69.5000 %
Epoch: 11 	Training Loss: 0.6448295157030224933336626 	Training Accuracy: 70.5000 %
Ep

Epoch: 98 	Training Loss: 0.0879046634193468845452912 	Training Accuracy: 96.6250 %


KeyboardInterrupt: 

### Evaluate the Model on the Test Set

TODO: Be consistent with the test/validation terminology used throughout this notebook

In [13]:
def eval_model(net, testDataLoader):
    net.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, targets in testDataLoader:
            outputs = net(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += len(targets)
            correct += (predicted == targets).sum().item()
            
    print('Accuracy of the network on the set of %d test images: %d %%' % (total,
        100 * correct / total))

In [14]:
eval_model(model, test)

Accuracy of the network on the set of 200 test images: 69 %
