# Semantic Parsing

Building a simple virtual assistant with two modules: an intent classifier and a slot filler.

Go to https://drive.google.com/drive/folders/1JqAnRSkJqAWlHQRR8tN9is3vKZ-4VKWM?usp=sharing and click add shortcut to drive. This will add the data required for this problem set to your Google drive.

<img src="https://drive.google.com/uc?id=1LqHisiziX8Ri94Xs6Cv8mhx6vivFM3kS" alt="Drawing" height="300"/>


Run the below code snippet. It will generate a URL which generates an authorization code.* Enter it below to give Colab access to your Google drive. 

*Copy function may not work. If so, manually copy the authorization code.

In [None]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

Mounted at /content/drive/


Loading the train questions and answers

In [None]:
import json

train_data = []
for line in open(f'{parser_files}/train_questions_answers.txt'):
    train_data.append(json.loads(line))

# print a few examples
for i in range(5):
    print(train_data[i])
    print("-"*80)

{'question': 'Add an album to my Sylvia Plath playlist.', 'intent': 'AddToPlaylist', 'slots': {'music_item': 'album', 'playlist_owner': 'my', 'playlist': 'Sylvia Plath'}}
--------------------------------------------------------------------------------
{'question': 'add Diarios de Bicicleta to my la la playlist', 'intent': 'AddToPlaylist', 'slots': {'playlist': 'Diarios de Bicicleta', 'playlist_owner': 'my', 'entity_name': 'la la'}}
--------------------------------------------------------------------------------
{'question': 'book a table at a restaurant in Lucerne Valley that serves chicken nugget', 'intent': 'BookRestaurant', 'slots': {'restaurant_type': 'restaurant', 'city': 'Lucerne Valley', 'served_dish': 'chicken nugget'}}
--------------------------------------------------------------------------------
{'question': 'add iemand als jij to my playlist named In The Name Of Blues', 'intent': 'AddToPlaylist', 'slots': {'entity_name': 'iemand als jij', 'playlist_owner': 'my', 'playlist'

In [None]:
test_questions = []
for line in open(f'{parser_files}/test_questions.txt'):
    test_questions.append(json.loads(line))

test_answers = []
for line in open(f'{parser_files}/test_answers.txt'):
    test_answers.append(json.loads(line))

# print a few examples
for i in range(5):
    print(test_questions[i])
    print(test_answers[i])
    print("-"*80)

Add an artist to Jukebox Boogie Rhythm & Blues
{'intent': 'AddToPlaylist', 'slots': {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}}
--------------------------------------------------------------------------------
Will it be rainy at Sunrise in Ramey Saudi Arabia?
{'intent': 'GetWeather', 'slots': {'condition_description': 'rainy', 'timeRange': 'Sunrise', 'city': 'Ramey', 'country': 'Saudi Arabia'}}
--------------------------------------------------------------------------------
Weather in two hours  in Uzbekistan
{'intent': 'GetWeather', 'slots': {'timeRange': 'in two hours', 'country': 'Uzbekistan'}}
--------------------------------------------------------------------------------
Will there be a cloud in VI in 14 minutes ?
{'intent': 'GetWeather', 'slots': {'condition_description': 'cloud', 'state': 'VI', 'timeRange': 'in 14 minutes'}}
--------------------------------------------------------------------------------
add nuba to my Metal Party playlist
{'intent': 

## 1: Keyword-based intent classifier

Building a keyword-based intent classifier. Assigns a list of keywords for each intent, and then classifies a given question into an intent. If an input question matches multiple intents, picks the best one. If it does not match any keyword, returns None.

In [None]:
# List of all intents
intents = set()
for example in train_data:
    intents.add(example['intent'])
print(intents)

{'BookRestaurant', 'GetWeather', 'AddToPlaylist'}


In [None]:
def predict_intent_using_keywords(question):
  q = question.lower()
  resto = ['restaurant', 'table', 'food', 'book']
  weather = ['cold', 'hot', 'warm', 'humid', 'weather', 'rain', 'snow', 'blizzard', 'wind', 'storm', 'temperature', 'cloud', 'sunny', 'forecast', 'fog', 'smog']
  playlist = ['playlist', 'artist', 'music', 'album', 'song', 'tune', 'guitar', 'track']

  if any(x in q for x in resto):
    return 'BookRestaurant'
  elif any(x in q for x in weather):
    return 'GetWeather'
  elif any(x in q for x in playlist):
    return 'AddToPlaylist'

Evaluate the accuracy of the keyword based intent classifier:

In [None]:
from collections import Counter

'''Gives intent wise accuracy of the model'''
def evaluate_intent_accuracy(prediction_function_name):
  correct = Counter()
  total = Counter()
  for i in range(len(test_questions)):
    q = test_questions[i]
    gold_intent = test_answers[i]['intent']
    if prediction_function_name(q) == gold_intent:
      correct[gold_intent] += 1
    total[gold_intent] += 1
  for intent in intents:
    print(intent, correct[intent]/total[intent], total[intent])
    
# Evaluating the intent classifier. 
evaluate_intent_accuracy(predict_intent_using_keywords)

BookRestaurant 0.97 100
GetWeather 0.91 100
AddToPlaylist 0.92 100


## 2: Statistical intent classifier

Instead of making use of keywords, extract features from a given input question.Uses word2vec embeddings of each word and take an average to represent the sentence to build a feature representation for a given sentence. Then trains a logistic regression.

In [None]:
import nltk
nltk.download('word2vec_sample')

[nltk_data] Downloading package word2vec_sample to /root/nltk_data...
[nltk_data]   Unzipping models/word2vec_sample.zip.


True

In [None]:
from nltk.data import find
import gensim

word2vec_sample = str(find('models/word2vec_sample/pruned.word2vec.txt'))
word2vec_model = gensim.models.KeyedVectors.load_word2vec_format(word2vec_sample, binary=False)

In [None]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
'''Trains a logistic regression model on the entire training data. For an input question (x), the model learns to predict an intent (Y).'''
from nltk.tokenize import word_tokenize
from sklearn.linear_model import LogisticRegression
import numpy as np
def train_logistic_regression_intent_classifier():
  doc = train_data
  intents = []
  questions = []
  for i in doc:
    tokenized = word_tokenize(i['question'])
    temp = [word for word in tokenized if word in word2vec_model.vocab]
    questions.append(np.mean(word2vec_model[temp], axis=0))
    tmp = 0
    if i['intent'] == 'GetWeather':
      tmp = 1
    elif i['intent'] == 'AddToPlaylist':
      tmp = 2
    elif i['intent'] == 'BookRestaurant':
      tmp = 3
    intents.append(tmp)

  questions = np.array(questions)
  res = LogisticRegression(random_state=0).fit(questions, intents)

  return res
logisticReg = train_logistic_regression_intent_classifier()

In [None]:
'''For an input question, the model predicts an intent'''
def predict_intent_using_logistic_regression(question):
    q = word_tokenize(question)
    weights = []

    tempDoc = [word for word in q if word in word2vec_model.vocab]
    weights = np.mean(word2vec_model[tempDoc], axis=0)
    x = logisticReg.predict(weights.reshape(1, -1))
    if x == 1:
      return 'GetWeather'
    elif x == 2:
      return 'AddToPlaylist'
    elif x == 3:
      return 'BookRestaurant'

In [None]:
# Evaluate the intent classifier
evaluate_intent_accuracy(predict_intent_using_logistic_regression)

BookRestaurant 1.0 100
GetWeather 1.0 100
AddToPlaylist 1.0 100


## 3: Slot filling

Building a slot filling model with `AddToPlaylist` intent.

In [None]:
# Let's stick to one target intent.
target_intent = "AddToPlaylist"

# This intent has the following slots
target_intent_slot_names = set()
for sample in train_data:
    if sample['intent'] == target_intent:
        for slot_name in sample['slots']:
            target_intent_slot_names.add(slot_name)
print(target_intent_slot_names)


# Extract all the relevant questions of this target intent from the test examples.
target_intent_questions = [] 
for i, question in enumerate(test_questions):
    if test_answers[i]['intent'] == target_intent:
        target_intent_questions.append(question)
print(len(target_intent_questions))

{'playlist_owner', 'entity_name', 'artist', 'playlist', 'music_item'}
100


In [None]:
import re
# Getting all slot values for each slot
slot_values = {}
for sample in train_data:
    if sample['intent'] == target_intent:
        for slot_name in sample['slots']:
            if slot_name in slot_values:
              slot_values[slot_name].append(sample['slots'][slot_name].lower())
            else:
              slot_values[slot_name] = [sample['slots'][slot_name].lower()]
for i in slot_values:
  slot_values[i] = list(set(slot_values[i]))

In [None]:
def initialize_slots():
    slots = {}
    for slot_name in target_intent_slot_names:
        slots[slot_name] = None
    return slots

def predict_slot_values(question):
    question = question.lower()
    slots = initialize_slots()   
    
    for slot_name in target_intent_slot_names:
      for value in slot_values[slot_name]:   
        if value in question:
          slots[slot_name] = value
    return slots


def evaluate_slot_prediction_recall(slot_prediction_function):
    correct = Counter()
    total = Counter()
    # predict slots for each question
    for i, question in enumerate(target_intent_questions):
        i = test_questions.index(question)
        gold_slots = test_answers[i]['slots']
        predicted_slots = slot_prediction_function(question)
        for name in target_intent_slot_names:
            if name in gold_slots:
                total[name] += 1.0
                if predicted_slots.get(name, None) != None and predicted_slots.get(name).lower() == gold_slots.get(name).lower():
                    correct[name] += 1.0
    for name in target_intent_slot_names:
        print(f"{name}: {correct[name] / total[name]}")

# Evaluate the slot prediction model      
print("Slot accuracy for the slot prediction model")
evaluate_slot_prediction_recall(predict_slot_values)


Slot accuracy for your slot prediction model
playlist_owner: 0.9444444444444444
entity_name: 0.05555555555555555
artist: 0.10869565217391304
playlist: 0.71
music_item: 1.0


In [None]:
# Find a true positive prediction for each slot
print('If the slot has no value then there is no true positive prediction for that slot\n')
for slotNames in target_intent_slot_names:
  var = False
  for i in range(300):
    if test_answers[i]['intent'] == 'AddToPlaylist': 
      if var == False and slotNames in test_answers[i]['slots'] and predict_slot_values(test_questions[i])[slotNames] is not None:    
        print("Question: ", test_questions[i]) 
        print("True slots: ",test_answers[i]['slots'])
        print("Predicted: ", slotNames, ":", predict_slot_values(test_questions[i])[slotNames])
        print('-'*80)
        var = True

If the slot has no value then there is no true positive prediction for that slot

Question:  add nuba to my Metal Party playlist
True slots:  {'entity_name': 'nuba', 'playlist_owner': 'my', 'playlist': 'Metal Party'}
Predicted:  playlist_owner : my
--------------------------------------------------------------------------------
Question:  Add give us rest to my 70s Smash Hits playlist.
True slots:  {'entity_name': 'give us rest', 'playlist_owner': 'my', 'playlist': '70s Smash Hits'}
Predicted:  entity_name : give us rest
--------------------------------------------------------------------------------
Question:  Add Roel van Velzen to my party of the century playlist.
True slots:  {'artist': 'Roel van Velzen', 'playlist_owner': 'my', 'playlist': 'party of the century'}
Predicted:  artist : roel van velzen
--------------------------------------------------------------------------------
Question:  Add an artist to Jukebox Boogie Rhythm & Blues
True slots:  {'music_item': 'artist', 'playli

In [None]:
# Find a false positive prediction for each slot
print('If the slot has no value then there is no false positive prediction for that slot\n')
for slotNames in target_intent_slot_names:
  var = False
  for i in range(300):
    if test_answers[i]['intent'] == 'AddToPlaylist': 
      if var == False and slotNames not in test_answers[i]['slots'] and predict_slot_values(test_questions[i])[slotNames] is not None:    
        print("Question: ", test_questions[i]) 
        print("True slots: ",test_answers[i]['slots'])
        print("Predicted: ", slotNames, ":", predict_slot_values(test_questions[i])[slotNames])
        print('-'*80)
        var = True

If the slot has no value then there is no false positive prediction for that slot

Question:  add tommy johnson to The MetalSucks Playlist
True slots:  {'artist': 'tommy johnson', 'playlist': 'The MetalSucks Playlist'}
Predicted:  playlist_owner : my
--------------------------------------------------------------------------------
Question:  Can you put this song from Yutaka Ozaki onto my this is miles davis playlist?
True slots:  {'music_item': 'song', 'artist': 'Yutaka Ozaki', 'playlist_owner': 'my', 'playlist': 'this is miles davis'}
Predicted:  entity_name : om
--------------------------------------------------------------------------------
Question:  add ireland in the junior eurovision song contest 2015 to my Jazzy Dinner playlist
True slots:  {'entity_name': 'ireland in the junior eurovision song contest 2015', 'playlist_owner': 'my', 'playlist': 'Jazzy Dinner'}
Predicted:  music_item : song
--------------------------------------------------------------------------------


In [None]:
# Find a true negative prediction for each slot
print('If the slot has no value then there is no true negative prediction for that slot\n')
for slotNames in target_intent_slot_names:
  var = False
  for i in range(300):
    if test_answers[i]['intent'] == 'AddToPlaylist': 
      if var == False and slotNames not in test_answers[i]['slots'] and predict_slot_values(test_questions[i])[slotNames] is None:    
        print("Question: ", test_questions[i]) 
        print("True slots: ",test_answers[i]['slots'])
        print("Predicted: ", slotNames, ":", predict_slot_values(test_questions[i])[slotNames])
        print('-'*80)
        var = True

If the slot has no value then there is no true negative prediction for that slot

Question:  Add an artist to Jukebox Boogie Rhythm & Blues
True slots:  {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted:  playlist_owner : None
--------------------------------------------------------------------------------
Question:  Add an artist to Jukebox Boogie Rhythm & Blues
True slots:  {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted:  entity_name : None
--------------------------------------------------------------------------------
Question:  Add an artist to Jukebox Boogie Rhythm & Blues
True slots:  {'music_item': 'artist', 'playlist': 'Jukebox Boogie Rhythm & Blues'}
Predicted:  artist : None
--------------------------------------------------------------------------------
Question:  add nuba to my Metal Party playlist
True slots:  {'entity_name': 'nuba', 'playlist_owner': 'my', 'playlist': 'Metal Party'}
Predicted:  music_item : N

In [None]:
# Find a false negative prediction for each slot
print('If the slot has no value then there is no false negative prediction for that slot\n')
for slotNames in target_intent_slot_names:
  var = False
  for i in range(300):
    if test_answers[i]['intent'] == 'AddToPlaylist': 
      if var == False and slotNames in test_answers[i]['slots'] and predict_slot_values(test_questions[i])[slotNames] is None:    
        print("Question: ", test_questions[i]) 
        print("True slots: ",test_answers[i]['slots'])
        print("Predicted: ", slotNames, ":", predict_slot_values(test_questions[i])[slotNames])
        print('-'*80)
        var = True

If the slot has no value then there is no false negative prediction for that slot

Question:  Onto jerry's Classical Moments in Movies, please add the album.
True slots:  {'playlist_owner': "jerry's", 'playlist': 'Classical Moments in Movies', 'music_item': 'album'}
Predicted:  playlist_owner : None
--------------------------------------------------------------------------------
Question:  add nuba to my Metal Party playlist
True slots:  {'entity_name': 'nuba', 'playlist_owner': 'my', 'playlist': 'Metal Party'}
Predicted:  entity_name : None
--------------------------------------------------------------------------------
Question:  Can you put this song from Yutaka Ozaki onto my this is miles davis playlist?
True slots:  {'music_item': 'song', 'artist': 'Yutaka Ozaki', 'playlist_owner': 'my', 'playlist': 'this is miles davis'}
Predicted:  artist : None
--------------------------------------------------------------------------------
Question:  Add the album to the The Sweet Suite playli