# LELA32052 Coursework Assignment

This document contains instructions, guidance and code for the coursework assignment for this module.

The assignment focuses on the task of intent classification. You heard about this in the lecture on dialogue systems, and read about it in this week's reading. It is an important step in modern task-based dialogue systems - given a particular piece of input from the speaker, the system tries to determine what goal the speaker is trying to achieve, in order that it can then produce an appropriate response.

Your task is to build a system that takes a transcribed user utterance as input and outputs one of seven different intents:

'PlayMusic', e.g. "play easy listening" <br>
'AddToPlaylist' e.g. "please add this song to road trip" <br>
'RateBook' e.g. "give this novel 5 stars"  <br>
'SearchScreeningEvent' e.g. "give me a list of local movie times"  <br>
'BookRestaurant' e.g. "i'd like a table for four at 7pm at Asti"   <br>
'GetWeather' e.g. "what's it like outside"  <br>
'SearchCreativeWork' "show me the new James Bond trailer"  <br>

You are going to evaluate the performance of this system using a test set of 700 user utterances.

In order to create the system you have a training set of 700 utterances and a validation/development set of 700 utterances.

This notebook contains code that you can use in the development of your system. Once you have created and evaluated your system, you are going to write a report of no more than 2000 words that describes and evaluates the task, the system and the experiments.

A guide as to what should go in your report can be found [here]( https://www.dropbox.com/s/zlmbk60a4ei1jdh/Writing%20your%20Computational%20Linguistics%20Research%20Report.docx)

Your report should be submitted via turnitin using the link on Blackboard. You should create a PDF of your notebook and add it as an appendix to your report (not included in word count).  This is just so that I can check what you have done if it isn't clear from your report. You will not be marked on the quality of any code you might include.

Please feel free to ask any questions about any part of the coursework. So that my response can benefit everyone please do so using the Coursework Discussion board on Blackboard. You can find a link to this down the left of the module Blackboard page.

## Preparation

### Link drive

Before you begin to build a system, there are a few steps necessary to set things up. The first of these is to link Colab to your Google Drive so that you can save files there.

In [1]:
from google.colab import drive
drive.mount("/content/gdrive")
!mkdir /content/gdrive/My\ Drive/Intent_Classification/

Mounted at /content/gdrive
mkdir: cannot create directory ‘/content/gdrive/My Drive/Intent_Classification/’: File exists


### Download data and some utilities

The next step is to download the data and some supporting code for the project and move it over to the Google Drive.

In [2]:
!wget https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/intent_classification_with_splits.csv
!wget https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/model.pth
!wget https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/nn_tools.py
!wget https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/nn_tools2.py
!wget https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/vectorizer.json

--2025-03-10 15:24:55--  https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/intent_classification_with_splits.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 139084 (136K) [text/plain]
Saving to: ‘intent_classification_with_splits.csv’


2025-03-10 15:24:56 (1.16 MB/s) - ‘intent_classification_with_splits.csv’ saved [139084/139084]

--2025-03-10 15:24:56--  https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/model.pth
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response

### Import packages

Finally we need to import some packages to use later

In [3]:
from argparse import Namespace
from collections import Counter
import json
import os
import re
import string
import random

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm_notebook
from nn_tools import Vocabulary, IntentVectorizer, IntentDataset, IntentClassifier
from nn_tools2 import *
from sklearn.metrics import confusion_matrix



## Rule-based approach

Your task here is to create a classifier using rules, in the form of regular expressions. I have provided you with the basic code for doing this. You will just need to edit the regular expressions in order to improve performance.

### Loading and inspecting project data
A valuable first step in order to understand the task is to inspect the data in order to understand the different intents being detected.

First you need to load the data:


In [4]:
intent_data=pd.read_csv('intent_classification_with_splits.csv')

You can then examine example utterances for each of the intent types as follows.

##### Play Music examples


In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "PlayMusic"]["text"].head(10).tolist

##### AddToPlaylist examples


In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "AddToPlaylist"]["text"].head(10).tolist()

##### RateBook examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "RateBook"]["text"].head(10).tolist()

##### SearchScreeningEvent examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "SearchScreeningEvent"]["text"].head(10).tolist()

##### BookRestaurant examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "BookRestaurant"]["text"].head(10).tolist()

##### GetWeather examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "GetWeather"]["text"].head(10).tolist()

##### SearchCreativeWork examples

In [None]:
pd.set_option('display.max_colwidth', None)
intent_data[intent_data["split"] == "train"][intent_data["intent"] == "SearchCreativeWork"]["text"].head(10).tolist()

## Building a rule based classifier

### Preprocessing

To increase the generalisability of your system you can preprocess it to, for example, convert morphological variants to a single "lemma". You can do this by adding different substitution (re.sub) functions to the function below. The function as is stands doesn't do anything - you need to update the re.sub statements. This isn't an essential step as you can deal with variants in your patterns, but it will help to reduce redundancy in your classifier regular expressions which you may find helpful.


In [5]:
def preprocess_utterance(utt):
  utt = re.sub("(?<!s)s$","", utt) # Remove 's' at the end of word if not 'ss'
  utt = re.sub("ed$","", utt)      # Remove past tense
  utt = re.sub("ing$","", utt)     # Remove present progressive
  utt = re.sub("er$","", utt)      # Remove comparative adjectives
  utt = re.sub("est$","", utt)     # Remove superlative adjectives
  return utt

You can test whether you patterns are working as intented using the following code.

In [None]:
test_input = input("Enter a utterance to test your preprocessing on: ")
print(preprocess_utterance(test_input))

### Define patterns

The function below takes an utterance as input and applies a series of regular expressions to identify the intent of the speaker. The regular expressions currently just looks for keywords taken from the intent name. You should update these patterns to be more appropriate and capture a wider range of utterances for each intent.

Each time you update the code you will need to run the code cell in order to then use the function.

The assign_intent function uses the re.findall function (see week 2) in order to make as many matches as possible with each pattern. The number of matches is then counted (using the len function) for each pattern. The intent with the largest number of matches is then returned as the predicted intent. Imagine for example that your patterns for PlayMusic and GetWeather were as follows: <br>
PlayMusic_Pattern = re.compile("play|music") <br>
GetWeather_Pattern = re.compile("get|weather") <br>
while the input utterances was "play some music by the weather girls".
In this case the PlayMusic pattern would match twice (for play and music) while the GetWeather pattern would only match once. PlayMusic would be returned as the predicted intent. Where there is a tie (as would happen if, for example, the input was simply "play the weather girls") a prediction is randomly sampled from among the tied intents.

In [6]:
def assign_intent(utt, verbose=False):
  PlayMusic_Pattern = re.compile("play|music|by|something|hear|fm|spotify|youtube|zvooq|radio|album")
  AddToPlaylist_Pattern = re.compile("add|playlist|song|album|artist")
  RateBook_Pattern = re.compile("rate|give|star|point|novel|[1-9]|10")
  SearchScreeningEvent_Pattern = re.compile(r"screen|cinema|theatre|movie|film|show|time|schedule|close|theatre")
  BookRestaurant_Pattern = re.compile("book|restaurant|reservation|reserve|table|people|spot")
  GetWeather_Pattern = re.compile("get|weather|forecast|hot|warm|rain|rainy|near|tomorrow|today|wind|windy|frost|frosty")
  SearchCreativeWork_Pattern = re.compile("creative|search|look|find|work|tv")

  weights = {}
  weights['PlayMusic'] = len(re.findall(PlayMusic_Pattern,  utt))
  weights['AddToPlaylist'] = len(re.findall(AddToPlaylist_Pattern,  utt))
  weights['RateBook'] = len(re.findall(RateBook_Pattern,  utt))
  weights['SearchScreeningEvent'] = len(re.findall(SearchScreeningEvent_Pattern,  utt))
  weights['BookRestaurant'] = len(re.findall(BookRestaurant_Pattern,  utt))
  weights['GetWeather'] = len(re.findall(GetWeather_Pattern,  utt))
  weights['SearchCreativeWork'] = len(re.findall(SearchCreativeWork_Pattern,  utt))
  if verbose:
      print(weights)
  if max(weights.values()) == 0:
      return random.choice(list(weights.keys()))
  else:
      weights_as_list = list(weights.items())
      random.shuffle(weights_as_list)
      weights=dict(weights_as_list)
      return max(weights, key=lambda key: weights[key])

### Evaluation
When you run this cell you will be asked to enter an utterance. When you press return the scores for each classification of your input will be printed. You can use this to check whether your preprocess_utterance and/or assign_intent functions are working as intended.

In [None]:
new_input = input("Enter a utterance to classify: ")
prediction = assign_intent(preprocess_utterance(new_input),verbose=True)
print(prediction)

In order to perform a stricter assessment of the performance of your classifier, you should examine its performance on the validation dataset. If you run the cell below you will be told the accuracy score on those 700 utterances.

In [7]:
predicted = [assign_intent(preprocess_utterance(item)) for item in intent_data[intent_data['split'] == "val"]['text']]
true = intent_data[intent_data['split'] == "val"]['intent']
accuracy = (predicted == true).sum()/len(true)
print(accuracy)

0.8057142857142857


Running the next cell will give you a "confusion matrix". This tells you the number of times that each intent as given in the rows is classified (correctly or otherwise) as each intent as represented by the columns. The columns are displayed as numbers but you can check which each of these numbers stands for by looking into the parentheses after each intent in the rows.

Studying this will give you an idea of where the classifier might be going wrong, and therefore of how you might update your patterns.

In [8]:
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist (0)", "BookRestaurant (1)", "GetWeather (2)", "PlayMusic (3)", "RateBook (4)", "SearchCreativeWork (5)", "SearchScreeningEvent (6)"]))

                           0   1   2   3   4   5   6
AddToPlaylist (0)         88   0   3   3   1   2   0
BookRestaurant (1)         0  85   2   0   1   3   1
GetWeather (2)             0   1  58   1   1  10   2
PlayMusic (3)              8   1   0  87   1   5  13
RateBook (4)               3  12  33   9  96   4   4
SearchCreativeWork (5)     1   1   3   0   0  70   0
SearchScreeningEvent (6)   0   0   1   0   0   6  80


Once you are happy that you have defined the best patterns that you can, you should evaluate the performance of your classifier on the test data (a set of 700 utterances that you haven't looked at). The accuracy printed is what you should include in your report.

In [9]:
predicted = [assign_intent(preprocess_utterance(item)) for item in intent_data[intent_data['split'] == "test"]['text']]
true = intent_data[intent_data['split'] == "test"]['intent']
accuracy = (predicted == true).sum()/len(true)
print(accuracy)

0.7


You can also generate a confusion matrix for the test data and use this in the discussion of the results in your write up.

In [10]:
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist (0)", "BookRestaurant (1)", "GetWeather (2)", "PlayMusic (3)", "RateBook (4)", "SearchCreativeWork (5)", "SearchScreeningEvent (6)"]))

                           0   1   2   3   4   5   6
AddToPlaylist (0)         95   0   4   3   1   6   0
BookRestaurant (1)         0  68   4   0   4   1   2
GetWeather (2)             1   2  55   0   2   8   4
PlayMusic (3)             16   0   7  71   0  15  11
RateBook (4)               9  18  26  12  73   4   7
SearchCreativeWork (5)     1   2   3   0   0  52   7
SearchScreeningEvent (6)   2   2   5   0   0  21  76


## Single Layer Perceptron Classifier

The next classifier you can build is single-layer perceptron. The specification of this model is in the next cell. You shouldn't make any changes to this.

In [11]:
class IntentClassifierPerceptron(nn.Module):
    """ a simple perceptron based classifier """
    def __init__(self, input_dim, output_dim):
        """
        Args:
            num_features (int): the size of the input feature vector
        """
        super(IntentClassifierPerceptron, self).__init__()
        self.fc1 = nn.Linear(input_dim, output_dim)




    def forward(self, x_in, apply_softmax=True):
        """The forward pass of the classifier

        Args:
            x_in (torch.Tensor): an input data tensor.
                x_in.shape should be (batch, num_features)
            apply_softmax (bool): a flag for the softmax activation
        Returns:
            the resulting tensor. tensor.shape should be (batch,)
        """

        y_out = self.fc1(x_in)
        if apply_softmax:
            y_out = F.softmax(y_out,dim=1)
        return y_out


### Preprocessing
As with the rule-based model, to increase the generalisability of your system you can preprocess the text to, for example, convert morphological variants to a single "lemma". You can do ths by adding different substitution (re.sub) functions to the function below. The function as is stands doesn't do anything - you need to update the re.sub statements.


In [12]:
def preprocess_utterance(utt):
  utt = re.sub("(?<!s)s$","", utt) # Remove 's' at the end of word if not 'ss'
  utt = re.sub("ed$","", utt)      # Remove past tense
  utt = re.sub("ing$","", utt)     # Remove present progressive
  utt = re.sub("er$","", utt)      # Remove comparative adjectives
  utt = re.sub("est$","", utt)     # Remove superlative adjectives
  return utt

For this machine-learning-based model we are going to make the preprocessing changes to the data used, and this will involve making changes to the file we have saved. When we need to revert to the original unaltered file, we can then run the following cell.

In [13]:
!wget https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/intent_classification_with_splits.csv -O intent_classification_with_splits.csv
!cp intent_classification_with_splits.csv /content/gdrive/My\ Drive/Intent_Classification/
intent_data=pd.read_csv('intent_classification_with_splits.csv')

--2025-03-10 15:26:31--  https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/intent_classification_with_splits.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 139084 (136K) [text/plain]
Saving to: ‘intent_classification_with_splits.csv’


2025-03-10 15:26:31 (1.17 MB/s) - ‘intent_classification_with_splits.csv’ saved [139084/139084]



You can check that your preprocessing patterns are doing what we want by testing them on examples using the following cell

In [None]:
test_input = input("Enter a utterance to test your preprocessing on: ")
print(preprocess_utterance(test_input))

Once you are happy with your preprocessing you can then transform the text in the training, validation and test data using the following cell. This alters the data on the disk so if you want to undo any changes later you will have to revert to the original data (as described above).

In [14]:
intent_data=pd.read_csv('intent_classification_with_splits.csv')
intent_data['text'] = [preprocess_utterance(item) for item in intent_data['text']]
intent_data.to_csv('intent_classification_with_splits.csv', index=False)
!cp intent_classification_with_splits.csv /content/gdrive/My\ Drive/Intent_Classification/

### Training the model
In order to first initialise your model and then train it, you should run the following cell. The training will take a minute or two to complete - the progress bars will tell you how far along it is.


In [15]:
params = initialise()
classifier = IntentClassifierPerceptron(input_dim=len(params.vectorizer.text_vocab),output_dim=len(params.vectorizer.intent_vocab))
train_state = trainModel(params, params.dataset, classifier)

Expanded filepaths: 
	/content/gdrive/My Drive/Intent_Classification/vectorizer.json
	/content/gdrive/My Drive/Intent_Classification/model.pth
Using CUDA: False


training routine:   0%|          | 0/100 [00:00<?, ?it/s]

split=train:   0%|          | 0/5 [00:00<?, ?it/s]

split=val:   0%|          | 0/5 [00:00<?, ?it/s]

### Test on individual utterance

As with the rule-based classifier you can look at the performance of the system on a single example utterance. This can be used to see whether your preprocessing is helping to capture the cases you hoped.

In [None]:
torch.manual_seed(0)
new_utterance = input("Enter a utterance to classify: ")
classifier = classifier.to("cpu")
prediction = predict_intent(preprocess_utterance(new_utterance), classifier, params.vectorizer)
print("{} -> {} (p={:0.2f})".format(new_utterance,
                                    prediction['intent'],
                                    prediction['probability']))

### Evaluate performance on validation data
In order to evaluate the performance of your system while you tweak your preprocessing you can evaluate performance on the validation data, and look at a confusion matrix.

In [16]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'val')
print("Test Accuracy: {:.2f}".format(train_state['val_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 85.16
                       0   1   2   3   4   5   6
AddToPlaylist         85   0   0   2   1   3   0
BookRestaurant         0  96   4   2   0   8  14
GetWeather             0   0  90   1   0   7  10
PlayMusic              1   1   1  79   1   6   6
RateBook               1   0   0   0  93   7   3
SearchCreativeWork     0   0   0   0   0  54   5
SearchScreeningEvent   0   0   2   0   0   9  48


  classifier.load_state_dict(torch.load(train_state['model_filename']))


### Evaluate performance on test data

Once you are happy that model performance is as good as you can achieve with this model type you should evaluate its accuracy on the test data. You can also use the confusion matrix for error analysis in your write up.

In [17]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'test')
print("Test Accuracy: {:.2f}".format(train_state['test_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 83.28
                        0   1   2   3   4   5   6
AddToPlaylist         103   1   1   2   1   3   1
BookRestaurant          2  83   5   0   0   6   6
GetWeather              1   0  86   1   1   2  11
PlayMusic               4   2   0  74   1  14   5
RateBook                1   0   0   2  67   8   4
SearchCreativeWork      0   0   1   0   0  59  11
SearchScreeningEvent    0   1   1   0   0   8  61


  classifier.load_state_dict(torch.load(train_state['model_filename']))


## Multilayer neural network classifier

The third kind of classifier that you should build is a two-layer neural network. The model is defined below. Again you don't need to change this code.

In [18]:
class IntentClassifierMLP(nn.Module):
    """ a simple perceptron based classifier """
    def __init__(self, input_dim, hidden_dim, output_dim):
        """
        Args:
            num_features (int): the size of the input feature vector
        """
        super(IntentClassifierMLP, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)



    def forward(self, x_in, apply_softmax=True):
        """The forward pass of the classifier

        Args:
            x_in (torch.Tensor): an input data tensor.
                x_in.shape should be (batch, num_features)
            apply_softmax (bool): a flag for the softmax activation
        Returns:
            the resulting tensor. tensor.shape should be (batch,)
        """
        intermediate_vector = F.relu(self.fc1(x_in))
        prediction_vector = self.fc2(intermediate_vector)

        if apply_softmax:
            prediction_vector = F.softmax(prediction_vector,dim=1)
        return prediction_vector


### Preprocessing
As with the previous models, to increase the generalisability of your system you can preprocess the text to, for example, convert morphological variants to a single "lemma". You can do ths by adding different substitution (re.sub) functions to the function below. The function as is stands doesn't do anything - you need to update the re.sub statements.


In [42]:
def preprocess_utterance(utt):
  utt = re.sub("(?<!s)s$","", utt) # Remove 's' at the end of word if not 'ss'
  utt = re.sub("ed$","", utt)      # Remove past tense
  utt = re.sub("ing$","", utt)     # Remove present progressive
  utt = re.sub("er$","", utt)      # Remove comparative adjectives
  utt = re.sub("est$","", utt)     # Remove superlative adjectives
  return utt

For this machine-learning-based model we are going to make the preprocessing changes to the data use, and this will involve making changes to the file we have saved. When we need to revert to the original unaltered file, we can then run the following cell.

In [19]:
!wget https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/intent_classification_with_splits.csv -O intent_classification_with_splits.csv
!cp intent_classification_with_splits.csv /content/gdrive/My\ Drive/Intent_Classification/
intent_data=pd.read_csv('intent_classification_with_splits.csv')

--2025-03-10 15:28:21--  https://raw.githubusercontent.com/cbannard/compling24/refs/heads/main/Intent_Classification/intent_classification_with_splits.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 139084 (136K) [text/plain]
Saving to: ‘intent_classification_with_splits.csv’


2025-03-10 15:28:21 (1.33 MB/s) - ‘intent_classification_with_splits.csv’ saved [139084/139084]



You can check that your preprocessing patterns are doing what we want by testing them on examples using the following cell

In [None]:
test_input = input("Enter a utterance to test your preprocessing on: ")
print(preprocess_utterance(test_input))


Once you are happy with your preprocessing you can then transform the text in the training, validation and test data using the following cell. This alters the data on the disk so if you want to undo any changes later you will have to revert to the original data (as described above).

In [20]:
intent_data=pd.read_csv('intent_classification_with_splits.csv')
intent_data['text'] = [preprocess_utterance(item) for item in intent_data['text']]
intent_data.to_csv('intent_classification_with_splits.csv', index=False)
!cp intent_classification_with_splits.csv /content/gdrive/My\ Drive/Intent_Classification/

### Training model
You can now move on to training the model using your preprocessed data.

One important variable that you can change is the number of nodes to include in your hidden layer. You can set this parameter in the cell below. The default is 100.

In [47]:
n_hidden_dims = 175

You can then initialise and train the model by running the following cell.

In [48]:
params = initialise()
classifier = IntentClassifierMLP(input_dim=len(params.vectorizer.text_vocab),hidden_dim=n_hidden_dims,output_dim=len(params.vectorizer.intent_vocab))
train_state = trainModel(params, params.dataset, classifier)

Expanded filepaths: 
	/content/gdrive/My Drive/Intent_Classification/vectorizer.json
	/content/gdrive/My Drive/Intent_Classification/model.pth
Using CUDA: False


training routine:   0%|          | 0/100 [00:00<?, ?it/s]

split=train:   0%|          | 0/5 [00:00<?, ?it/s]

split=val:   0%|          | 0/5 [00:00<?, ?it/s]

The following cell allows you to look at the performance of the system on a single example utterance. This can be used to see whether your preprocessing (and/or your choice of hidden dimensions) is capturing the cases you hoped.

In [56]:
torch.manual_seed(0)
new_utterance = input("Enter a utterance to classify: ")
classifier = classifier.to("cpu")
prediction = predict_intent(preprocess_utterance(new_utterance), classifier, params.vectorizer)
print("{} -> {} (p={:0.2f})".format(new_utterance,
                                    prediction['intent'],
                                    prediction['probability']))

Enter a utterance to classify: what's on nearby
what's on nearby -> PlayMusic (p=0.66)


### Evaluate model on validation data

In order to evaluate the performance of your system while you tweak it you can evaluate performance on the validation data, and look at a confusion matrix.

In [49]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'val')
print("Test Accuracy: {:.2f}".format(train_state['val_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 90.78
                       0   1   2   3   4   5   6
AddToPlaylist         84   0   0   0   0   0   0
BookRestaurant         0  95   1   2   1   1   6
GetWeather             1   1  86   0   0   3   6
PlayMusic              1   0   1  80   0   3   0
RateBook               0   0   0   0  94   0   1
SearchCreativeWork     1   0   1   1   0  82  13
SearchScreeningEvent   0   1   8   1   0   5  60


  classifier.load_state_dict(torch.load(train_state['model_filename']))


### Evaluate model on test data

Once you are happy that model performance is as good as you can achieve with this model type you should evaluate its accuracy on the test data. You can also use the confusion matrix for error analysis in your write up.

In [57]:
torch.manual_seed(0)
predicted, true = evaluate(params, classifier, train_state, 'test')
print("Test Accuracy: {:.2f}".format(train_state['test_acc']))
print(pd.DataFrame(confusion_matrix(predicted, true), index=["AddToPlaylist", "BookRestaurant", "GetWeather", "PlayMusic", "RateBook", "SearchCreativeWork", "SearchScreeningEvent"]))

Test Accuracy: 89.06
                        0   1   2   3   4   5   6
AddToPlaylist         104   0   0   1   0   0   1
BookRestaurant          0  80   1   0   0   2   0
GetWeather              1   3  89   0   1   0   1
PlayMusic               3   1   0  75   2  11   0
RateBook                0   0   0   0  67   1   1
SearchCreativeWork      3   2   1   2   0  77  18
SearchScreeningEvent    0   1   3   1   0   9  78


  classifier.load_state_dict(torch.load(train_state['model_filename']))


## That's it!

Once you have run through all of this notebook, made all of the additions and changes you want, and run all of the experiments you want you should write up the results following the guidance [here]( https://www.dropbox.com/s/zlmbk60a4ei1jdh/Writing%20your%20Computational%20Linguistics%20Research%20Report.docx)

Your report should be submitted via turnitin using the link on Blackboard.

To repeat a couple of things from the top of the sheet:

- Once you have completed your work in this notebook please create a PDF including all of your changes and include it in your submission. This is just so that I can check what you have done if it isn't clear from your report. You will not be marked on the quality of any code you might include.

- Please feel free to ask any questions about any part of the coursework. So that my response can benefit everyone please do so using the Coursework Discussion board on Blackboard. You can find a link to this down the left of the module Blackboard page.

I hope you find it an enjoyable and worthwhile exercise.
