<h2>Chatbot using Natural Language Processing</h2>

Natural language processing (NLP) is a field that focuses on making natural human language usable by computer programs. NLTK, or Natural Language Toolkit, is a Python package that you can use for NLP.

In [1]:
#Import required libraries
import nltk
import numpy as np
import random
import string # to process standard python strings
import warnings
warnings.filterwarnings('ignore')

<h3>Tokenizing</h3>
By tokenizing, you can conveniently split up text by word or by sentence. This will allow you to work with smaller pieces of text that are still relatively coherent and meaningful even outside of the context of the rest of the text. It’s your first step in turning unstructured data into structured data, which is easier to analyze.

When you’re analyzing text, you’ll be tokenizing by word and tokenizing by sentence. Here’s what both types of tokenization bring to the table:

Tokenizing by word: Words are like the atoms of natural language. They’re the smallest unit of meaning that still makes sense on its own. Tokenizing your text by word allows you to identify words that come up particularly often. For example, if you were analyzing a group of job ads, then you might find that the word “Python” comes up often. That could suggest high demand for Python knowledge, but you’d need to look deeper to know more.

Tokenizing by sentence: When you tokenize by sentence, you can analyze how those words relate to one another and see more context. Are there a lot of negative words around the word “Python” because the hiring manager doesn’t like Python? Are there more terms from the domain of herpetology than the domain of software development, suggesting that you may be dealing with an entirely different kind of python than you were expecting?

Here’s how to import the relevant parts of NLTK so you can tokenize by word and by sentence:

In [2]:
# Open a sample text document.
f=open('Chatbot.txt','r',errors = 'ignore')
raw=f.read()
raw=raw.lower()# converts to lowercase
nltk.download('punkt') # first-time use only
nltk.download('wordnet') # first-time use only
nltk.download('omw-1.4')
sent_tokens = nltk.sent_tokenize(raw)# converts to list of sentences 
word_tokens = nltk.word_tokenize(raw)# converts to list of words

[nltk_data] Downloading package punkt to /home/alpha/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /home/alpha/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /home/alpha/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [3]:
# Display and example of a sentence and word token
sent_tokens[:2]  # You got a list of strings that NLTK considers to be sentences, such as:

['a chatbot (also known as a talkbot, chatterbot, bot, im bot, interactive agent, or artificial conversational entity) is a computer program or an artificial intelligence which conducts a conversation via auditory or textual methods.',
 'such programs are often designed to convincingly simulate how a human would behave as a conversational partner, thereby passing the turing test.']

In [4]:
word_tokens[:6]  # You got a list of strings that NLTK considers to be words, such as:

['a', 'chatbot', '(', 'also', 'known', 'as']

Wordnet is an large, freely and publicly available lexical database for the English language aiming to establish structured semantic relationships between words. It offers lemmatization capabilities as well and is one of the earliest and most commonly used lemmatizers. 

Lemmatization is the process of converting a word to its base form.

Let's perform lemmatization on the tokens created in the above cells. 
We will also remove punctuation marks.


In [5]:
lemmer = nltk.stem.WordNetLemmatizer()
# WordNet is a semantically-oriented dictionary of English included in NLTK.
def LemTokens(tokens):
    return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
    return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))

The next cell presents a method to hard code the intents and responses of the chatbot. I have included greetings, jokes and facts as the intents.

This chatbot will tell jokes as well as facts.

In [6]:
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey",)
GREETING_RESPONSES = ["hi", "hey", "*nods*", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
 
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)
        
        
        
JOKE_INPUTS = ("Tell me a joke", "jokes", "funny",)
JOKE_RESPONSES = ["I invented a new word! Plagiarism!", 
                  "Helvetica and Times New Roman walk into a bar.“Get out of here!” shouts the bartender. “We don’t serve your type.”", 
                  "Hear about the new restaurant called Karma? There’s no menu: You get what you deserve.", 
                  "Did you hear about the claustrophobic astronaut? He just needed a little space.", 
                  "Why don’t scientists trust atoms? Because they make up everything.",
                  "A man tells his doctor, “Doc, help me. I’m addicted to Twitter!” The doctor replies, “Sorry, I don’t follow you …”"]
def jokes(sentence):
 
    for word in sentence.split():
        if word.lower() in JOKE_INPUTS:
            return random.choice(JOKE_RESPONSES)
        
FACT_INPUTS = ("Tell me a fact", "facts", "interesting",)
FACT_RESPONSES = ["Human teeth are the only part of the body that cannot heal themselves. Teeth are coated in enamel which is not a living tissue.", 
                  "The Ancient Romans used to drop a piece of toast into their wine for good health - hence why we 'raise a toast'.", 
                  "There is actually a word for someone giving an opinion on something they know nothing about. An 'ultracrepidarian' is someone who voices thoughts beyond their expertise.", 
                  "The Japanese word 'Kuchi zamishi' is the act of eating when you're not hungry because your mouth is lonely. We do this all the time.", 
                  "Competetive art used to be an Olympic sport. Between 1912 and 1948, the international sporting events awarded medals for music, painting, sculpture and architecture. Shame it didn't catch on, the famous pottery scene in Ghost could have won an Olympic medal as well as an Academy Award for the best screenplay.",
                  "It's illegal to own just one guinea pig in Switzerland. It's considered animal abuse because they're social beings and get lonely."]
def facts(sentence):
 
    for word in sentence.split():
        if word.lower() in FACT_INPUTS:
            return random.choice(FACT_RESPONSES)

In [7]:
# This feature extractor library is used to identify intents from raw text input
from sklearn.feature_extraction.text import TfidfVectorizer

In [8]:
# This library is used to pair intents and responses
from sklearn.metrics.pairwise import cosine_similarity

Method 1

In [9]:
# Here is where the magic happens.
# This method outputs responses based on the text document i.e Chatbot.txt
def response(user_response):
    robo_response=''
    sent_tokens.append(user_response)
    TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
    tfidf = TfidfVec.fit_transform(sent_tokens)
    vals = cosine_similarity(tfidf[-1], tfidf)
    idx=vals.argsort()[0][-2]
    flat = vals.flatten()
    flat.sort()
    req_tfidf = flat[-2]
    if(req_tfidf==0):
        robo_response=robo_response+"I am sorry! I don't understand you"
        return robo_response
    else:
        robo_response = robo_response+sent_tokens[idx]
        return robo_response

Method 2

In [10]:
# This method that outputs responses based on hard coded intents and responses
flag=True
print("Cartman: My name is Cartman. I aslo like telling jokes and saying random facts. If you wish to exit, type Bye!")
while(flag==True):
    user_response = input()
    user_response=user_response.lower()
    if(user_response!='bye'):
        if(user_response=='thanks' or user_response=='thank you' ):
            flag=False
            print("Cartman: You are welcome..")
        else:
            if(greeting(user_response)!=None):
                print("Cartman: "+greeting(user_response))
            else:
                if(jokes(user_response)!=None):
                    print("Cartman: "+jokes(user_response))
                else:
                    if(facts(user_response)!=None):
                        print("Cartman: "+facts(user_response)) 
                    else:
                        print("Cartman: ",end="")
                        print(response(user_response))
                        sent_tokens.remove(user_response)
    else:
        flag=False
        print("Cartman: Bye! take care..")

Cartman: My name is Cartman. I aslo like telling jokes and saying random facts. If you wish to exit, type Bye!


 Hello


Cartman: hi


 What is a chatbot


Cartman: design
the chatbot design is the process that defines the interaction between the user and the chatbot.the chatbot designer will define the chatbot personality, the questions that will be asked to the users, and the overall interaction.it can be viewed as a subset of the conversational design.


 What is artificial intelligence


Cartman: a chatbot (also known as a talkbot, chatterbot, bot, im bot, interactive agent, or artificial conversational entity) is a computer program or an artificial intelligence which conducts a conversation via auditory or textual methods.


 What is AI


Cartman: one pertinent field of ai research is natural language processing.


 Science


Cartman: however, a study conducted by narrative science in 2015 found that 80 percent of their respondents believe ai improves worker performance and creates jobs.


 What are APIs


Cartman: apis
there are many apis available for building your own chatbots, such as aarc.


 What is aarc


Cartman: apis
there are many apis available for building your own chatbots, such as aarc.


 Are there any chatbot competitions


Cartman: chatbot competitions focus on the turing test or more specific goals.


 What about Virtual Assistant


Cartman: the term "chatterbot" was originally coined by michael mauldin (creator of the first verbot, julia) in 1994 to describe these conversational programs.today, most chatbots are either accessed via virtual assistants such as google assistant and amazon alexa, via messaging apps such as facebook messenger or wechat, or via individual organizations' apps and websites.


 How can i maintain my chatbot


Cartman: design
the chatbot design is the process that defines the interaction between the user and the chatbot.the chatbot designer will define the chatbot personality, the questions that will be asked to the users, and the overall interaction.it can be viewed as a subset of the conversational design.


 Say something funny


Cartman: A man tells his doctor, “Doc, help me. I’m addicted to Twitter!” The doctor replies, “Sorry, I don’t follow you …”


 Say something interesting


Cartman: The Japanese word 'Kuchi zamishi' is the act of eating when you're not hungry because your mouth is lonely. We do this all the time.


 Thanks


Cartman: You are welcome..
