## Building a Simple Chatbot from Scratch in Python (using NLTK)

> This chat robot mainly uses NLTK library to calculate the TF-IDF values of user input sentence and sentences in the corpus chatbot.txt, and then chooses the sentences in the corpus that are closest to the cosine angle of user input statements as the answers (if they exist).

### Chat robot effect display

![](https://cdn-images-1.medium.com/max/800/1*pPcVfZ7i-gLMabUol3zezA.gif)

### 

+ Original blog [Building a Simple Chatbot from Scratch in Python (using NLTK)](https://medium.com/analytics-vidhya/building-a-simple-chatbot-in-python-using-nltk-7c8c8215ac6e)
+ Original code [Building-a-Simple-Chatbot-in-Python-using-NLTK](https://github.com/parulnith/Building-a-Simple-Chatbot-in-Python-using-NLTK)

In [1]:
# coding: utf-8

# # Meet Robo: your friend

import nltk
import warnings
warnings.filterwarnings("ignore")

# nltk.download() # for downloading packages

import numpy as np
import random
import string # to process standard python strings

with open('chatbot.txt','r',errors = 'ignore') as f:
    raw=f.read()

In [2]:
raw = raw.lower()# converts to lowercase
#nltk.download('punkt') # first-time use only
#nltk.download('wordnet') # first-time use only
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences 
word_tokens = nltk.word_tokenize(raw)# converts to list of words

In [3]:
print(sent_tokens[:2])
print(word_tokens[:5])

['a chatbot (also known as a talkbot, chatterbot, bot, im bot, interactive agent, or artificial conversational entity) is a computer program or an artificial intelligence which conducts a conversation via auditory or textual methods.', 'such programs are often designed to convincingly simulate how a human would behave as a conversational partner, thereby passing the turing test.']
['a', 'chatbot', '(', 'also', 'known']


In [4]:
lemmer = nltk.stem.WordNetLemmatizer()
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

In [5]:
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)

In [6]:
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

In [7]:
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]

In [8]:
# Checking for greetings
def greeting(sentence):
    """If user's input is a greeting, return a greeting response"""
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

In [9]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [10]:
# Generating response
def response(user_response):
    robo_response=''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        robo_response=robo_response+"I am sorry! I don't understand you"
        return robo_response
    else:
        robo_response = robo_response+sent_tokens[idx]
        return robo_response

In [11]:
flag=True
print("ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!")

while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("ROBO: You are welcome..")
        else:
            if(greeting(user_response)!=None):
                print("ROBO: "+greeting(user_response))
            else:
                print("ROBO: ",end="")
                print(response(user_response))
                sent_tokens.remove(user_response)
    else:
        flag=False
        print("ROBO: Bye! take care..")    

ROBO: My name is Robo. I will answer your queries about Chatbots. If you want to exit, type Bye!


 Hello


ROBO: hi there


 Maintenance


ROBO: maintenance
to keep chatbots up to speed with changing company products and services, traditional chatbot development platforms require ongoing maintenance.


 In 1950, Alan


ROBO: background
in 1950, alan turing's famous article "computing machinery and intelligence" was published, which proposed what is now called the turing test as a criterion of intelligence.


 Bye


ROBO: Bye! take care..


### Analysis "def response(user_response)":

In [12]:
user_response = "IBM's Watson computer has"

In [13]:
sent_tokens.append(user_response)

In [14]:
sent_tokens[-3:]

['messenger, windows live messenger, aol instant messenger and other instant messaging protocols.',
 "there has also been a published report of a chatbot used in a fake personal ad on a dating service's website.",
 "IBM's Watson computer has"]

In [15]:
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(sent_tokens)

In [16]:
tfidf

<70x656 sparse matrix of type '<class 'numpy.float64'>'
	with 1053 stored elements in Compressed Sparse Row format>

In [17]:
vals = cosine_similarity(tfidf[-1], tfidf)
print(vals.shape)

(1, 70)


In [18]:
print(vals)

[[0.08347842 0.         0.         0.         0.         0.
  0.         0.0734716  0.         0.         0.         0.
  0.         0.         0.0761791  0.         0.09238494 0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.07883241 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.06920467 0.         0.
  0.         0.         0.         0.         0.05018282 0.07954668
  0.         0.         0.42861197 0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.09972515 1.        ]]


In [19]:
idx=vals.argsort()[0][-2]
flat = vals.flatten()

In [20]:
vals.argsort()

array([[34, 33, 66, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 48, 49, 65,
        51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 31, 32, 29,
         1,  2,  3,  4,  5,  6,  8,  9, 10, 11, 12, 13, 15, 30, 64, 67,
        28, 27, 25, 24, 23, 22, 20, 19, 18, 17, 21, 46, 39,  7, 14, 26,
        47,  0, 16, 68, 50, 69]])

In [21]:
vals.argsort()[0][-2]

50

In [22]:
flat.sort()
req_tfidf = flat[-2]

In [23]:
robo_response = ""
if(req_tfidf==0):
    robo_response=robo_response+"I am sorry! I don't understand you"
else:
    robo_response = robo_response+sent_tokens[idx]

In [24]:
print(robo_response)

ibm's watson computer has been used as the basis for chatbot-based educational toys for companies such as cognitoys intended to interact with children for educational purposes.
