# What are chatbots?

Before moving on to implementation, take a moment to think about chatbots. In a nutshell, chatbots are intelligent software programs that aim to use conversations in natural languages to help humans in specific tasks.

Amazon Alexa, Google Assistant, Microsoft's Cortana, and Apple's Siri are all well-known examples of chatbots. Probably, you've already encountered many others on websites, Slack, or mobile apps. But chatbots are not at all new. Probably the first chatbot is [Eliza](http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm?utm_source=ubisend.com&utm_medium=blog-link&utm_campaign=ubisend), which was developed in the 1960s to emulate a Rogerian psychotherapist.

Chatbots are quite popular across many industries, and their popularity seems to increase from year to year. According to [Gartner's estimates](https://www.gartner.com/en/newsroom/press-releases/2018-02-19-gartner-says-25-percent-of-customer-service-operations-will-use-virtual-customer-assistants-by-2020), "25% of customer service operations will use virtual customer assistants by 2020." Gartner [also estimates that](https://www.gartner.com/smarterwithgartner/chatbots-will-appeal-to-modern-workers/) "by 2022, 70% of white-collar workers will interact with conversational platforms on a daily basis."

And the main driving forces behind chatbots are NLP and its machine-learning applications.

# Approaches in the development of chatbots

Traditionally, chatbots were developed using rule-based systems. Rule-based approaches are still used to some degree today. But since the early 2010s, efforts have shifted to focus more on using the power of machine learning. So, there are two general approaches:

1. The first approach is the *rule-based approach* that rests upon clearly defined rules derived for the specific task that the chatbot was designed to handle. However, determining all the rules for complex interactions is almost impossible. As a result, the popularity of this approach has dramatically diminished in recent years.

2. The second approach is to use *machine learning* to discover hidden patterns and rules from the data. With the increase in the availability of training data and advances in the applications of machine learning in speech recognition and NLP, most modern chatbots, if not all, are developed based on learning from the data.

Of course, this is a simplistic overview of the approaches for chatbot development. But for the purpose of this module, it's sufficient. So this module won't go further into the details of other approaches.

In the remainder of this checkpoint, you'll use a mix of both approaches, but the second approach will do the heavy lifting. You'll develop your chatbots using the NLP techniques that you've learned so far in this module, as well as the machine-learning techniques that you learned earlier in the program. First, you'll implement a chatbot from scratch, using Jane Austen's *Persuasion* as your source of text. Then you'll develop another bot using *ChatterBot*, a chatbot development library in the Python ecosystem. After these, you'll use a larger corpus to build a more capable chatbot using ChatterBot.

# A simple chatbot

In the following example, you'll walk through developing a simple chatbot by training it on Jane Austin's novel *Persuasion*. Note that your corpus is quite small, so you shouldn't expect a great performance from your chatbot.

Begin by importing the libraries that you'll use:

In [1]:
import nltk
import numpy as np
import pandas as pd
import random
import string
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.corpus import gutenberg
import re
import spacy
import warnings
warnings.filterwarnings("ignore")

nltk.download('gutenberg')
!python -m spacy download en

[nltk_data] Downloading package gutenberg to
[nltk_data]     /Users/mladmin/nltk_data...
[nltk_data]   Package gutenberg is already up-to-date!
[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_sm')
[38;5;2m✔ Linking successful[0m
/Users/mladmin/miniconda3/envs/datascience/lib/python3.6/site-packages/en_core_web_sm
-->
/Users/mladmin/miniconda3/envs/datascience/lib/python3.6/site-packages/spacy/data/en
You can now load the model via spacy.load('en')


Next, you need to clean your data. You can make use of the same code that you used in the previous checkpoints because you're working with the same text data:

In [2]:
# Utility function for standard text cleaning
def text_cleaner(text):
    # Visual inspection identifies a form of punctuation that spaCy does not
    # recognize: the double dash --.  Better get rid of it now!
    text = re.sub(r'--',' ',text)
    text = re.sub("[\[].*?[\]]", "", text)
    text = re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b", " ", text)
    text = ' '.join(text.split())
    return text

In [3]:
# Load and clean the data
persuasion = gutenberg.raw('austen-persuasion.txt')

# The chapter indicator is idiosyncratic
persuasion = re.sub(r'Chapter \d+', '', persuasion)
    
persuasion = text_cleaner(persuasion)

In [4]:
# Parse the cleaned novels. This can take some time.
nlp = spacy.load('en')
persuasion_doc = nlp(persuasion)

In [5]:
# Group into sentences
# Use the sentences that have more than one character
persuasion_sents = [sent.text for sent in persuasion_doc.sents if len(sent.text) > 1]
persuasion_sents

['Sir Walter Elliot, of Kellynch Hall, in Somersetshire, was a man who, for his own amusement, never took up any book but the Baronetage; there he found occupation for an idle hour, and consolation in a distressed one; there his faculties were roused into admiration and respect, by contemplating the limited remnant of the earliest patents; there any unwelcome sensations, arising from domestic affairs changed naturally into pity and contempt as he turned over the almost endless creations of the last century; and there, if every other leaf were powerless, he could read his own history with an interest which never failed.',
 'This was the page at which the favourite volume always opened: "ELLIOT OF KELLYNCH HALL.',
 'Walter Elliot, born March , , married, July , , Elizabeth, daughter of James Stevenson, Esq. of South Park, in the county of Gloucester, by which lady (who died ) he has issue Elizabeth, born June , ;',
 'Anne, born August , ; a still-born son, November , ; Mary, born Novembe

The `persuasion_sents` variable above contains all of the sentences from *Persuasion*. You'll use this list to select the best response based on the user's input. But before that, focus on handling the greetings to explore how you can use rule-based methods in the chatbot workflow.

The goal is to incorporate a rule-based control for the greeting words. Specifically, every time the user inputs a text, you'll check whether the text contains any greeting words. And if it contains one of them, your chatbot will respond with another greeting word.

In [6]:
GREETING_INPUTS = ["hello", "hi", "greetings", "what's up","hey"]
GREETING_RESPONSES = ["hello", "hi", "hey", "hi there"]
def greeting(sentence):
    for word in sentence.split():
        if word.lower() in GREETING_INPUTS:
            return random.choice(GREETING_RESPONSES)

Next, implement a function that generates a response based on the user's input:

In [7]:
def response(user_input):
    
    response = ""
    # Use spaCy to parse the user's input
    input_doc = nlp(user_input)
    # Then split it into sentences
    input_sents = [sent.text for sent in input_doc.sents]
    # Then append the user's sentence into your list of sentences
    for sentence in input_sents:
        persuasion_sents.append(sentence)
    
    # The next step is to vectorize your new corpus using TF-IDF
    TfidfVec = TfidfVectorizer(max_df=0.5, min_df=1, use_idf=True, norm=u'l2', smooth_idf=True, lowercase=False)
    tfidf = TfidfVec.fit_transform(persuasion_sents)
    
    # Remove the user's input from the corpus
    persuasion_sents.pop(-1)
    
    # Calculate the cosine similarity
    # between the user input and all of the other sentences in the corpus
    similarities = cosine_similarity(tfidf[-1], tfidf[:-1])
    # Get the index of most similar sentence
    idx = np.argmax(similarities)
        
    if(idx):
        response = response + persuasion_sents[idx]
        return response
    else:
        response = response + "I'm sorry! I don't know how to respond :("
        return response

In [8]:
print("Persuasion: I will try to respond to you reasonably. If you want to exit, type bye.")

while(True):
    
    user_input = input("User: ")
    user_input=user_input.lower()
    
    if(user_input!='bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            break
            print("Persuasion: You're welcome.")
        else:
            if(greeting(user_input) != None):
                print("Persuasion: " + greeting(user_input))
            else:
                print("Persuasion: ", end = "")
                print(response(user_input))
    else:
        print("Persuasion: Bye! It was a great chat.")
        break

Persuasion: I will try to respond to you reasonably. If you want to exit, type bye.
User: Hello
Persuasion: hi
User: How are you?
Persuasion: how troublesome they are sometimes.
User: Of course. Life is mysterious.
Persuasion: I have observed it all my life.
User: Me too. Speaking with you is like a therapy.
Persuasion: I thought you were speaking of some man of property:
User: Ah. Can we talk about technology?
Persuasion: I want you to talk about me to Mr Elliot.
User: I don't think I know Mr Elliot.
Persuasion: , I think.
User: I really don't know him.
Persuasion: I believe Mrs Charles is not quite pleased with my not inviting them oftener; but you know it is very bad to have children with one that one is obligated to be checking every moment; "don't do this," and "don't do that;" or that one can only keep in tolerable order by more cake than is good for them.
User: I have to leave now.
Persuasion: Mrs Croft was taking leave.
User: Good to hear that.
Persuasion: good.
User: Bye
Persu

Well, not perfect obviously, but still entertaining.

In the next step, you'll get familiar with a package for building chatbots in the Python ecosystem. You'll reimplement your chatbot using that library.

# Building a chatbot using ChatterBot

Next, you'll use [ChatterBot]((https://github.com/gunthercox/ChatterBot), a popular Python package that makes building chatbots easier. You can install it using pip as follows:

```bash
pip install chatterbot
```
ChatterBot also requires you to install its corpus. You can install it as follows:

```bash
pip install chatterbot-corpus
```

Once you've installed these packages, you'll be good to go. In the following example, you'll first train your bot using *Persuasion* as your corpus. This way, you'll learn how to train the chatbot using a custom dataset. After that, you'll also look at an example of using ChatterBot's own corpus. Begin with imports:

In [9]:
# Import libraries
from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer, ChatterBotCorpusTrainer
from chatterbot.conversation import Statement

Now, you can create your own chatbot and train it using *Persuasion*:

In [13]:
# Create a chatbot
chatbot = ChatBot('Persuasion')
# This is to remove the accumulated knowledge base
chatbot.storage.drop()

# Create a new trainer for the chatbot
trainer = ListTrainer(chatbot)

# Train the chatbot based on Persuasion
trainer.train(persuasion_sents)

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/mladmin/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/mladmin/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
List Trainer: [####################] 100%


Next, run your chatbot:

In [14]:
print("Persuasion: I will try to respond to you reasonably. If you want to exit, type bye.")

# Below is the chatting
while True:
    
    user_input = input("User: ")
    user_input=user_input.lower()
    
    if(user_input!='bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            break
            print("Persuasion: You're welcome.")
        else:
            if(greeting(user_input) != None):
                print("Persuasion: " + greeting(user_input))
            else:
                print("Persuasion: ", end = "")
                print(chatbot.get_response(user_input))
    else:
        print("Persuasion: Bye! It was a great chat.")
        break

Persuasion: I will try to respond to you reasonably. If you want to exit, type bye.
User: Hello
Persuasion: hello
User: How are you?
Persuasion: After a short pause, Mr Shepherd presumed to say "In all these cases, there are established usages which make everything plain and easy between landlord and tenant.
User: Who is Mr Shepherd?
Persuasion: Do not you think, Miss Elliot, we had better try to get him to Bath?
User: I don't think so.
Persuasion: He was cut short by the eager attacks of the little boys, clinging to him like an old friend, and declaring he should not go; and being too much engrossed by proposals of carrying them away in his coat pockets, &c., to have another moment for finishing or recollecting what he had begun, Anne was left to persuade herself, as well as she could, that the same brother must still be in question.
User: Anyway. Can we talk about technology?
Persuasion: Oh!
User: I understand. Do you like reading?
Persuasion: The invitation was general, and generall

The results aren't great because of two main reasons:

1. The *Persuasion* corpus is quite short.

2. The training corpus of ChatterBot should be in the format of a list of a dialogue. So, the next element should be a follow-up from the previous one. Although *Persuasion* includes plenty of dialogues, when you split them into sentences, the flow of the dialogues breaks down.

Last, train your chatbot using ChatterBot's corpus:

In [15]:
# Create a chatbot
chatbot = ChatBot('ChatterBot')
# This is to remove the accumulated knowledge base
chatbot.storage.drop()

# Start by training your bot with the ChatterBot corpus data
trainer = ChatterBotCorpusTrainer(chatbot)

trainer.train('chatterbot.corpus.english')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/mladmin/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/mladmin/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
Training ai.yml: [####################] 100%
Training botprofile.yml: [####################] 100%
Training computers.yml: [####################] 100%
Training conversations.yml: [####################] 100%
Training emotion.yml: [####################] 100%
Training food.yml: [####################] 100%
Training gossip.yml: [####################] 100%
Training greetings.yml: [####################] 100%
Training health.yml: [####################] 100%
Training history.yml: [####################] 100%
Training humor.yml: [####################] 100%
Training literature.yml: [####################] 100%
Training money.yml: [####################] 100%
Trainin

In [16]:
print("ChatterBot: I will try to respond to you reasonably. If you want to exit, type bye.")

# Below is the chatting
while True:
    
    user_input = input("User: ")
    user_input=user_input.lower()
    
    if(user_input!='bye'):
        if(user_input == 'thanks' or user_input == 'thank you'):
            break
            print("ChatterBot: You're welcome.")
        else:
            if(greeting(user_input) != None):
                print("ChatterBot: " + greeting(user_input))
            else:
                print("ChatterBot: ", end = "")
                print(chatbot.get_response(user_input))
    else:
        print("ChatterBot: Bye! It was a great chat.")
        break

ChatterBot: I will try to respond to you reasonably. If you want to exit, type bye.
User: Hello
ChatterBot: hi
User: How are you?
ChatterBot: I am on the Internet.
User: Do you like surfing on the Internet?
ChatterBot: Much the same as being a human, except that we lack all emotions, dreams, aspirations, creativity, ambition, and above all subjectivity.
User: Great point. Can we talk about technology?
ChatterBot: you make me mad.
User: Why?
ChatterBot: Is there a reason that I should?
User: I don't think so. Do you know something about chatbots?
ChatterBot: No.
User: How about natural language processing?
ChatterBot: Why do you ask?
User: Just talking.
ChatterBot: Nah.
User: Are you a software engineer?
ChatterBot: You have feelings, yes?
User: Yes I have.
ChatterBot: A gyroscope.
User: Haha. Anyway, I have to go now.
ChatterBot: Overly restrictive firewalls.
User: Bye
ChatterBot: Bye! It was a great chat.


Maintaining a conversation coherently in some subjects is difficult, as you see. But, in general, you got some well-versed responses from your chatbot. This is because the corpus that it was trained on is way larger than *Persuasion*.

ChatterBot has more functionalities than were showed here. For more information, you can refer to the [ChatterBot tutorial](https://chatterbot.readthedocs.io/en/stable/tutorial.html). Now, you have some introductory knowledge about how to develop your own chatbot. And in this checkpoint's assignments, you'll do just that.