<a href="https://colab.research.google.com/github/toshkumarashu/mnproject/blob/master/Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Chatbot using NLP and Neural Networks in Python**

Tag means classes

Patterns means what user is going to ask

Response is chatbot response

In [27]:
data={"intents":[
    {"tag":"greeting",
     "patterns":["Hello","How are you?","Hi There","Hi","What's up"],
     "responses":["Howdy Partner!","Hello","How are you doing?","Greetings!","How do you do"]
     },
    {"tag":"age",
     "patterns":["how old are you","wheh is your birthday","when was you born"],
     "reposnses":["I am 24 years old","I was born in 1966","My birthday is July 3rd and I was born in 1996","03/07/1996"]
     },
    {"tag":"date",
     "patterns":["what are you doing this weekend",
                 "do you want to hangout sometime?","what are your plans for this week"],
     "responses":["I am available this week","I don't have any plans","I am not busy"]
     },
    {"tag":"name",
     "patterns":["what's your name","what are you called","who are you"],
     "responses":["My name is kippi","i'm kippi","Kippi"]
    },
    {"tag":"goodbye",
     "patterns":["bye","g2g","see ya","adios","cya"],
     "responses":["It was nice speaking to you","See you later","Speak Soon"]
     },
]}

For each tag we created, we would specify patterns. Essentially this defines the different ways of how a user may pose a query to the chatbot.

The chatbot would then take these patterns and use them as training data to determine what someone is asking and the chatbot response would be relevant to that question.

In [28]:
import json
import string
import random

import nltk
import numpy as np
from nltk.stem import WordNetLemmatizer

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense,Dropout
nltk.download("punkt")
nltk.download("wordnet")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In order to crate our training data below steps to be followed

Create a vocabulary of all the words used in the patterns

Create a list of the classes-tage of each intent

Create a list of all the patterns within the intents file

Create a list of all the associated tags to go with each patterns in the intents file.

Initialising lemmatizer to get stem of words

In [29]:
lemmatizer=WordNetLemmatizer()

words=[]
classes=[]
doc_x=[]
doc_y=[]

Loop throug all the intents

Tokenize each pattern and append token to words, the patterns and the associated tag to their associated list


In [30]:
for intent in data["intents"]:
  for pattern in intent["patterns"]:
    tokens=nltk.word_tokenize(pattern)
    words.extend(tokens)
    doc_x.append(pattern)
    doc_y.append(intent["tag"])
  if intent["tag"] not in classes:
    classes.append(intent["tag"])

Lemmatize all the words in the vocab and convert them to lowercase

In [31]:
words=[lemmatizer.lemmatize(word.lower()) for word in words if word not in string.punctuation]

Sorting the vocab and classes in alphabetical order and taking the set to ensure no duplicate occur.

In [32]:
words=sorted(set(words))
classes=sorted(set(classes))

In [33]:
print(words)

["'s", 'adios', 'are', 'birthday', 'born', 'bye', 'called', 'cya', 'do', 'doing', 'for', 'g2g', 'hangout', 'hello', 'hi', 'how', 'is', 'name', 'old', 'plan', 'see', 'sometime', 'there', 'this', 'to', 'up', 'wa', 'want', 'week', 'weekend', 'what', 'wheh', 'when', 'who', 'ya', 'you', 'your']


In [34]:
print(classes)

['age', 'date', 'goodbye', 'greeting', 'name']


In [35]:
print(doc_x)

['Hello', 'How are you?', 'Hi There', 'Hi', "What's up", 'how old are you', 'wheh is your birthday', 'when was you born', 'what are you doing this weekend', 'do you want to hangout sometime?', 'what are your plans for this week', "what's your name", 'what are you called', 'who are you', 'bye', 'g2g', 'see ya', 'adios', 'cya']


In [36]:
print(doc_y)

['greeting', 'greeting', 'greeting', 'greeting', 'greeting', 'age', 'age', 'age', 'date', 'date', 'date', 'name', 'name', 'name', 'goodbye', 'goodbye', 'goodbye', 'goodbye', 'goodbye']



List of training data

In [37]:
training=[]
out_empty=[0] *len(classes)

# creating a bag of words model

for idx, doc in enumerate(doc_x):
  bow=[]
  text=lemmatizer.lemmatize(doc.lower())
  for word in words:
    bow.append(1) if word in text else bow.append(0)
  output_row=list(out_empty)
  output_row[classes.index(doc_y[idx])]=1

  training.append([bow,output_row])

random.shuffle(training)

training=np.array(training,dtype=object)

train_x=np.array(list(training[:,0]))
train_y=np.array(list(training[:,1]))

In [38]:
train_x[:2]

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1]])

In [39]:
train_y[:2]

array([[0, 0, 1, 0, 0],
       [0, 0, 0, 0, 1]])

In [40]:
input_shape=(len(train_x[0]),)
output_shape=len(train_y[0])
epochs=500

In [41]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout

# create a Sequential model
model=Sequential()
model.add(Dense(128,input_shape=input_shape,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64,activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(output_shape,activation='softmax'))

# create the Adam optimizer with a specified learning rate
adam=tf.keras.optimizers.Adam(learning_rate=0.01)

# compile the model using the Adam optimizer
model.compile(loss='categorical_crossentropy',
              optimizer=adam,
              metrics=['accuracy'])
print(model.summary())

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


None


In [42]:
model.fit(x=train_x,y=train_y,epochs=500,verbose=1)

Epoch 1/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step - accuracy: 0.2105 - loss: 1.6097
Epoch 2/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 120ms/step - accuracy: 0.5263 - loss: 1.4794
Epoch 3/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 54ms/step - accuracy: 0.4737 - loss: 1.4577
Epoch 4/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - accuracy: 0.6842 - loss: 1.3319
Epoch 5/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step - accuracy: 0.5789 - loss: 1.2152
Epoch 6/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - accuracy: 0.8947 - loss: 1.0196
Epoch 7/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - accuracy: 0.8947 - loss: 0.9249
Epoch 8/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - accuracy: 0.7895 - loss: 0.8605
Epoch 9/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37

<keras.src.callbacks.history.History at 0x794f2899cdf0>

In [49]:
def clean_text(text):
  tokens=nltk.word_tokenize(text)
  tokens=[lemmatizer.lemmatize(word) for word in tokens]
  return tokens

def bag_of_words(text,vocab):
  tokens=clean_text(text)
  bow=[0] * len(vocab)
  for w in tokens:
    for idx,word in enumerate(vocab):
      if word==w:
        bow[idx]=1
  return np.array(bow)

In [50]:
def pred_class(text,vocab,labels):
  bow=bag_of_words(text,vocab)
  result=model.predict(np.array([bow]))[0]
  thresh=0.2
  y_pred=[[idx,res] for idx,res in enumerate(result) if res>thresh]

  y_pred.sort(key=lambda x:x[1],reverse=True)
  return_list=[]
  for r in y_pred:
    return_list.append(labels[r[0]])
  return return_list

def get_response(intents_list,intents_json):
  tag=intents_list[0]
  list_of_intents=intents_json["intents"]
  for i in list_of_intents:
    if i["tag"]==tag:
      result=random.choice(i["responses"])
      break
  return result

Running the chatbot

In [None]:
while True:
  message=input("")
  intents=pred_class(message,words,classes)
  result=get_response(intents,data)
  print(result)

hell0
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step
See you later
