## Part B: Chat Bot

**Problem Description:**

Great Learning has a an academic support department which receives numerous support requests every day throughout the
year. Teams are spread across geographies and try to provide support round the year. Sometimes there are circumstances where due to
heavy workload certain request resolutions are delayed, impacting company’s business. Some of the requests are very generic where a
proper resolution procedure delivered to the user can solve the problem. Company is looking forward to design an automation which can
interact with the user, understand the problem and display the resolution procedure [ if found as a generic request ] or redirect the request
to an actual human support executive if the request is complex or not in it’s database.

In [1]:
import json
import pandas as pd
import numpy as np
import pickle

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
with open('BluBot.json') as file:
  data = json.load(file)

In [4]:
tagPatterns = []

In [5]:
for intent in data['intents']:
  tag = intent['tag']
  print(tag)
  for pattern in intent['patterns']:
    tagPatterns.append([pattern, tag])

Intro
Exit
Bot
Profane


In [6]:
tagPatterns

[['hi', 'Intro'],
 ['how are you', 'Intro'],
 ['is anyone there', 'Intro'],
 ['hello', 'Intro'],
 ['whats up', 'Intro'],
 ['hey', 'Intro'],
 ['yo', 'Intro'],
 ['listen', 'Intro'],
 ['please help me', 'Intro'],
 ['i belong to', 'Intro'],
 ['i am from', 'Intro'],
 ['hey ya', 'Intro'],
 ['talking to you for first time', 'Intro'],
 ['thank you', 'Exit'],
 ['thanks', 'Exit'],
 ['cya', 'Exit'],
 ['see you', 'Exit'],
 ['later', 'Exit'],
 ['see you later', 'Exit'],
 ['goodbye', 'Exit'],
 ['i am leaving', 'Exit'],
 ['have a Good day', 'Exit'],
 ['you helped me', 'Exit'],
 ['thanks a lot', 'Exit'],
 ['thanks a ton', 'Exit'],
 ['you are the best', 'Exit'],
 ['great help', 'Exit'],
 ['too good', 'Exit'],
 ['what is your name', 'Bot'],
 ['who are you', 'Bot'],
 ['name please', 'Bot'],
 ['when are your hours of opertions', 'Bot'],
 ['what are your working hours', 'Bot'],
 ['hours of operation', 'Bot'],
 ['working hours', 'Bot'],
 ['hours', 'Bot'],
 ['what the hell', 'Profane'],
 ['bloody stupid bot'

In [7]:
intents = pd.DataFrame(tagPatterns, columns =['pattern', 'tag'])

In [8]:
intents.head()

Unnamed: 0,pattern,tag
0,hi,Intro
1,how are you,Intro
2,is anyone there,Intro
3,hello,Intro
4,whats up,Intro


In [9]:
#Save the intents as csv
intents.to_csv('intents.csv')

In [10]:
responses = {}

In [11]:
for intent in data['intents']:
  tag = intent['tag']
  for response in intent['responses']:
    responses[tag.lower()]=response               #In given JSON, only one response is given so we can store it in dictionary for O(1) lookup later 

In [12]:
with open('responses.pkl', 'wb') as f:
    pickle.dump(responses, f)

with open('responses.pkl', 'rb') as f:
    responses = pickle.load(f)

In [13]:
def get_response_for_tag(tag):
  if tag.lower() in responses.keys():
    return responses[tag.lower()]
  else:
    return "Can you please rephrase that perhaps?"

In [15]:
get_response_for_tag("iNTRO")

'Hello! how can i help you ?'

## 2. Clean the inputs

In [16]:
!pip install contractions
import nltk
import inflect
import contractions
from bs4 import BeautifulSoup
import re, string, unicodedata
from nltk import word_tokenize, sent_tokenize
from nltk.corpus import stopwords
from nltk.stem import LancasterStemmer, WordNetLemmatizer
from sklearn.preprocessing import LabelEncoder
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

Collecting contractions
  Downloading contractions-0.1.68-py2.py3-none-any.whl (8.1 kB)
Collecting textsearch>=0.0.21
  Downloading textsearch-0.0.21-py2.py3-none-any.whl (7.5 kB)
Collecting pyahocorasick
  Downloading pyahocorasick-1.4.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)
[K     |████████████████████████████████| 106 kB 4.3 MB/s 
[?25hCollecting anyascii
  Downloading anyascii-0.3.0-py3-none-any.whl (284 kB)
[K     |████████████████████████████████| 284 kB 22.6 MB/s 
[?25hInstalling collected packages: pyahocorasick, anyascii, textsearch, contractions
Successfully installed anyascii-0.3.0 contractions-0.1.68 pyahocorasick-1.4.4 textsearch-0.0.21
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpor

True

In [17]:
# Pipeline for text cleaning

def denoise_text(text):
    # Strip html if any. For ex. removing <html>, <p> tags
    soup = BeautifulSoup(text, "html.parser")
    text = soup.get_text()
    # Replace contractions in the text. For ex. didn't -> did not
    text = contractions.fix(text)
    return text

def tokenize(text):
    return nltk.word_tokenize(text)

def remove_non_ascii(words):
    """Remove non-ASCII characters from list of tokenized words"""
    new_words = []
    for word in words:
        new_word = unicodedata.normalize('NFKD', word).encode('ascii', 'ignore').decode('utf-8', 'ignore')
        new_words.append(new_word)
    return new_words
def to_lowercase(words):
    """Convert all characters to lowercase from list of tokenized words"""
    new_words = []
    for word in words:
        new_word = word.lower()
        new_words.append(new_word)
    return new_words
def remove_punctuation(words):
    """Remove punctuation from list of tokenized words"""
    new_words = []
    for word in words:
        new_word = re.sub(r'[^\w\s]', '', word)
        if new_word != '':
            new_words.append(new_word)
    return new_words
def replace_numbers(words):
    """Replace all integer occurrences in list of tokenized words with textual representation"""
    p = inflect.engine()
    new_words = []
    for word in words:
        if word.isdigit():
            new_word = p.number_to_words(word)
            new_words.append(new_word)
        else:
            new_words.append(word)
    return new_words
def remove_numbers(words):
    """Remove all integer occurrences in list of tokenized words with textual representation"""
    new_words = []
    for word in words:
        if word.isdigit():
            new_word = ''
            new_words.append(new_word)
        else:
            new_words.append(word)
    return new_words
def remove_stopwords(words):
    """Remove stop words from list of tokenized words"""
    new_words = []
    for word in words:
        if word not in stopwords.words('english'):
            new_words.append(word)
    return new_words
def stem_words(words):
    """Stem words in list of tokenized words"""
    stemmer = LancasterStemmer()
    stems = []
    for word in words:
        stem = stemmer.stem(word)
        stems.append(stem)
    return stems

def lemmatize_verbs(words):
    """Lemmatize verbs in list of tokenized words"""
    lemmatizer = WordNetLemmatizer()
    lemmas = []
    for word in words:
        lemma = lemmatizer.lemmatize(word, pos='v')
        lemmas.append(lemma)
    return lemmas

In [18]:
def normalize_text(words):
    words = remove_non_ascii(words)
    words = to_lowercase(words)
    words = remove_punctuation(words)
    #words = remove_numbers(words)
    #words = remove_stopwords(words)
    #words = stem_words(words)
    words = lemmatize_verbs(words)
    return words

In [19]:
def text_clean(text):
    text = denoise_text(text)
    text = ' '.join([x for x in normalize_text(tokenize(text))])
    return text
intents['pattern'] = [text_clean(x) for x in intents['pattern']]

In [20]:
intents.shape

(45, 2)

In [21]:
intents.head()

Unnamed: 0,pattern,tag
0,hi,Intro
1,how be you,Intro
2,be anyone there,Intro
3,hello,Intro
4,what be up,Intro


### 3. Train-Test Split Data

In [22]:
from sklearn.model_selection import train_test_split

In [23]:
X_train, X_test, y_train, y_test =train_test_split(intents.pattern,intents.tag, random_state=42,test_size = 0.3,shuffle = True)

In [24]:
print(f"Training Size:{y_train.shape[0]} \n Test Size: {y_test.shape[0]}")

Training Size:31 
 Test Size: 14


### 4. Vectorize the inputs:

We can use Tf-IDF here

In [25]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidfvectorizer = TfidfVectorizer(min_df=1, max_features=1000)
tfidf_X = tfidfvectorizer.fit_transform(X_train)
tfidfvectorizer.get_feature_names_out()

array(['anyone', 'be', 'belong', 'bloody', 'bot', 'cya', 'day', 'do',
       'from', 'good', 'hate', 'have', 'hell', 'help', 'hey', 'hi',
       'hours', 'how', 'jerk', 'later', 'leave', 'listen', 'lot', 'me',
       'name', 'of', 'operation', 'opertions', 'piece', 'please', 'see',
       'shit', 'smart', 'stupid', 'thank', 'the', 'there', 'think', 'to',
       'too', 'useless', 'very', 'what', 'when', 'who', 'work', 'ya',
       'you', 'your'], dtype=object)

In [26]:
tfidf_X.toarray()
tfidf_X_train = pd.DataFrame(tfidf_X.todense(), columns = tfidfvectorizer.get_feature_names())

### 5. Prepare Model and Train

In [33]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y_train_le = le.fit_transform(y_train)

In [67]:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
clf = OneVsRestClassifier(SVC(random_state=42)).fit(tfidf_X_train, y_train_le)

### 6. Evaluate Trained Model

In [35]:
tfidf_X_test = tfidfvectorizer.transform(X_test)
tfidf_X_test.toarray()
tfidf_X_test = pd.DataFrame(tfidf_X_test.todense(), columns = tfidfvectorizer.get_feature_names())

In [68]:
predicted = clf.predict(tfidf_X_test)
predicted_labels = le.inverse_transform(predicted)

In [69]:
y_test_le = le.transform(y_test)

In [70]:
from sklearn.metrics import accuracy_score,  f1_score
def print_report(modelName,predicted_labels):
  print("Model: "+modelName)
  accuracy = accuracy_score(y_test, predicted_labels)
  print(f"Accuracy: {accuracy}")
  f1= f1_score(y_test, predicted_labels,average="macro")
  print(f"F1 Score: {f1}")
  #r = [modelName,accuracy,f1]
  #report.append(r)

### 7. Evaluation Report of Trained Model

In [71]:
print_report("tfidf",predicted_labels)

Model: tfidf
Accuracy: 0.5
F1 Score: 0.5077922077922078


### 8. Create Preprocessing pipeline for input from chat

In [37]:
def preprocess_input(text):
  text = text_clean(text)
  text_vectors = tfidfvectorizer.transform([text])
  text_vectors.toarray()
  text_vectors = pd.DataFrame(text_vectors.todense(), columns = tfidfvectorizer.get_feature_names())
  return text_vectors


In [38]:
def get_tag(text):
  text_vectors = preprocess_input(text)
  prediction = clf.predict(text_vectors)
  tag = le.inverse_transform(prediction)
  return tag[0]

In [72]:
# Save labelencoder, classifier, tfidfvectorizer
leModel = open('labelEncoder.pkl', 'wb')
pickle.dump(le, leModel)
leModel.close()

clfModel = open('customResponseClassifier.pkl','wb')
pickle.dump(clf, clfModel)
clfModel.close()

tfIdfVectorModel = open('tfIdfModel.pkl','wb')
pickle.dump(tfidfvectorizer, tfIdfVectorModel)
tfIdfVectorModel.close()


### 9. Test Sample Inputs

In [39]:
get_tag("what is your name")

'Bot'

In [40]:
get_tag("Thanks a ton")

'Exit'

In [41]:
get_tag("What is deep learning")

'Bot'

In [None]:
get_tag("See you")

'Exit'

In [42]:
get_tag("Bye")   

'Exit'

### Put all components together

In [76]:
def load_model(modelFileName):
  pkl = open(modelFileName, 'rb')
  model = pickle.load(pkl) 
  print("Loading: "+str(type(model)))
  pkl.close()
  return model

In [77]:

tfidfvectorizer = load_model('tfIdfModel.pkl')
le = load_model('labelEncoder.pkl')
clf = load_model('customResponseClassifier.pkl')




Loading: <class 'sklearn.feature_extraction.text.TfidfVectorizer'>
Loading: <class 'sklearn.preprocessing._label.LabelEncoder'>
Loading: <class 'sklearn.multiclass.OneVsRestClassifier'>


In [50]:
def manual_corrections(text):
  if text.lower() == "bye":
    return "Exit", True
  elif text.lower() == "hi":
    return "Intro",True
  else:
    return "",False

In [51]:
def ask_bot(text):
  tag = get_tag(text)
  tag_manual, override = manual_corrections(text)
  if override:
    tag = tag_manual
  response = get_response_for_tag(tag)
  return response

### Test the backend

In [78]:
ask_bot("Hi, how are you?")

'Hello! how can i help you ?'

In [None]:
ask_bot("Bye")  

'I hope I was able to assist you, Good Bye'

In [None]:
ask_bot("Does god exists?")

'Tarnsferring the request to your PM'

In [None]:
def chat_window():
  try:
    print("Welcome to Great Learning !")
    print("How can I help you today?")
    while True:
      inp = input("\n\n >>")
      response = ask_bot(inp.lower())
      print(response)
      if response == "I hope I was able to assist you, Good Bye":
        break
  except KeyboardInterrupt:
    print("I hope I was able to assist you, Good Bye")


In [None]:
chat_window()

Welcome to Great Learning !
How can I help you today?


 >>Hi
Hello! how can i help you ?


 >>how are you?
Hello! how can i help you ?


 >>what is deep learning?
Link: Neural Nets wiki


 >>Cannot access olympus
Link: Olympus wiki


 >>what are some ensemble techniques
Link: Machine Learning wiki 


 >>who are youi
I am your virtual learning assistant


 >>There is some problem
Hello! how can i help you ?


 >>can you create a ticket?
Tarnsferring the request to your PM


 >>Thank you
I hope I was able to assist you, Good Bye


In [None]:
## Misc to print HTML
!sudo apt-get install texlive-xetex texlive-fonts-recommended texlive-generic-recommended

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  fonts-droid-fallback fonts-lato fonts-lmodern fonts-noto-mono fonts-texgyre
  javascript-common libcupsfilters1 libcupsimage2 libgs9 libgs9-common
  libijs-0.35 libjbig2dec0 libjs-jquery libkpathsea6 libpotrace0 libptexenc1
  libruby2.5 libsynctex1 libtexlua52 libtexluajit2 libzzip-0-13 lmodern
  poppler-data preview-latex-style rake ruby ruby-did-you-mean ruby-minitest
  ruby-net-telnet ruby-power-assert ruby-test-unit ruby2.5
  rubygems-integration t1utils tex-common tex-gyre texlive-base
  texlive-binaries texlive-latex-base texlive-latex-extra
  texlive-latex-recommended texlive-pictures texlive-plain-generic tipa
Suggested packages:
  fonts-noto apache2 | lighttpd | httpd poppler-utils ghostscript
  fonts-japanese-mincho | fonts-ipafont-mincho fonts-japanese-gothic
  | fonts-ipafont-gothic fonts-arphic-ukai fonts-arphic-uming fonts-

In [None]:
%cd '/content/drive/MyDrive/Colab Notebooks/'
!jupyter nbconvert --to html 'NLP Project 1 Part B ChatBot.ipynb'

/content/drive/MyDrive/Colab Notebooks
[NbConvertApp] Converting notebook NLP Project 1 Part B ChatBot.ipynb to html
[NbConvertApp] Writing 366019 bytes to NLP Project 1 Part B ChatBot.html
