# Creating a Chatbot from Scratch using Python and Scikit-learn

### Applications
- FAQ bots
- Recommendations
- Airports
- Taxi Bookings
- Hotel Bookings
    

### Chatbot architecture
- Humans (No engineering involved)
- Rule-based (Regular Expressions)
- Predictive (Retrieval Based)
- Generative

### Training - Finding Intents
- where is my hotel
    - where is my hotel
    - hotel location/
    - how do i get to the hotel?
- when is checkout_time
    - when is the checkout time?
    - when do i need to check out?

### Task: Creating a chatbot that will help you answer some basic questions
- Import the libraries
- Create training phrases
- Collect all user utterances
- Create Bag of Words Model
- Create a Classifier
- Train the classifier
- Predict on the trained classifier

### Import the libraries

In [1]:
from sklearn.metrics import precision_recall_fscore_support
from sklearn.naive_bayes import MultinomialNB

### Create Training phrases
...... create more like these for your own domain chatbot
- intent: [user utterances]

In [2]:
training_phrases = {
    "help-me": ' '.join(["I have a problem",
                         "Hey i need some answers",
                         "Can you help me with this?",
                         "I need help",
                         "Please help me"
                        ]),
}

In [3]:
training_phrases

{'help-me': 'I have a problem Hey i need some answers Can you help me with this? I need help Please help me'}

### Collect all training documents as user utterances

In [4]:
training_documents = list(training_phrases.values())
labels = list(training_phrases.keys())

In [5]:
training_documents

['I have a problem Hey i need some answers Can you help me with this? I need help Please help me']

### Tokenize the user utterances

In [7]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
stop_words = set(stopwords.words('english')) 

word_tokens = []
for sent in training_documents:
    word_tokens.append(word_tokenize(sent))

print(word_tokens)

filtered_sentence = [w for w in word_tokens[0] if not w in stop_words]

print(word_tokens)
print(filtered_sentence)

[['I', 'have', 'a', 'problem', 'Hey', 'i', 'need', 'some', 'answers', 'Can', 'you', 'help', 'me', 'with', 'this', '?', 'I', 'need', 'help', 'Please', 'help', 'me']]
[['I', 'have', 'a', 'problem', 'Hey', 'i', 'need', 'some', 'answers', 'Can', 'you', 'help', 'me', 'with', 'this', '?', 'I', 'need', 'help', 'Please', 'help', 'me']]
['I', 'problem', 'Hey', 'need', 'answers', 'Can', 'help', '?', 'I', 'need', 'help', 'Please', 'help']


### Create Bag of Words Model
- Import CountVectorizer
- fit_transform on training documents
- get_feature_names()

In [20]:
# Make a bag of words model

  (0, 16)	1
  (0, 27)	1
  (0, 18)	1
  (0, 25)	2
  (0, 29)	1
  (0, 8)	1
  (0, 10)	1
  (0, 37)	1
  (0, 17)	3
  (0, 24)	2
  (0, 36)	1
  (0, 32)	1
  (0, 26)	1
  (1, 6)	2
  (1, 0)	1
  (1, 34)	1
  (1, 1)	2
  (1, 23)	1
  (1, 11)	1
  (1, 7)	1
  (1, 2)	1
  (2, 16)	1
  (2, 6)	5
  (2, 12)	1
  (2, 22)	1
  (2, 14)	1
  (2, 15)	1
  (2, 4)	1
  (2, 5)	1
  (2, 28)	2
  (2, 35)	1
  (2, 3)	1
  (2, 30)	1
  (2, 33)	1
  (3, 18)	1
  (3, 37)	1
  (3, 19)	2
  (3, 31)	1
  (3, 20)	1
  (3, 21)	1
  (3, 9)	1
  (3, 13)	1


['addicted',
 'alcohol',
 'alcoholic',
 'all',
 'alone',
 'always',
 'am',
 'an',
 'answers',
 'are',
 'can',
 'daily',
 'depressed',
 'doing',
 'dont',
 'friends',
 'have',
 'help',
 'hey',
 'hi',
 'hola',
 'how',
 'lonely',
 'love',
 'me',
 'need',
 'please',
 'problem',
 'sad',
 'some',
 'the',
 'there',
 'this',
 'time',
 'to',
 'why',
 'with',
 'you']

### Create a Classifier
- Import
- Create
- Train

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

### Predict on user utterances
- create a user query
- Transform the query using vectorizer
- Run prediction on the transformed query

array(['depression-problem'], dtype='<U18')

### Putting it all together inside predict( )

In [9]:
def predict(raw_queries):
    
    return classifier.predict(queries)

In [None]:
predicted = predict(["I am very much sad", "can we talk?", "Can you help me?", "i take wine everyday and i cant live without it"])
expected = ["depression-problem", "help-me", "help-me", "alcohol-addiction"]

In [37]:
predicted

array(['depression-problem', 'help-me', 'help-me', 'alcohol-addiction'],
      dtype='<U18')

In [38]:
evaluation = precision_recall_fscore_support(expected, predicted)
evaluation

(array([1., 1., 1.]),
 array([1., 1., 1.]),
 array([1., 1., 1.]),
 array([1, 1, 2]))

In [39]:
metrics = {}
(metrics['p'], metrics['r'], metrics['f1'], _) = evaluation
metrics

{'p': array([1., 1., 1.]), 'r': array([1., 1., 1.]), 'f1': array([1., 1., 1.])}

### Challenges / Questions to be answered
- Return the answer
- Exclude unimportant words("stop words")
- Handle synonyms (e.g. "lobby" = "front desk")
- Handle typos
- Return "Unknown"
- Handle a entity/parameter ("set my check out time to 3 PM")

### Returning the Answer

In [54]:
responses = {
    .... fill the responses for each intent
}

In [56]:
responses['alcohol-response']

'Good to know that. Now I can work with you in making you better.'

In [61]:
predicted = predict(["i take wine everyday and i cant live without it"])
# expected = ["alcohol-addiction"]
predicted

array(['alcohol-addiction'], dtype='<U18')

In [64]:
def send_response(raw_queries):
    predicted = predict(raw_queries)
    print(predicted[0])
    if predicted[0] == "alcohol-addiction":
        return(responses["alcohol-response"])
    else:
        return "You are not an alcoholic!"

In [65]:
bot_response = send_response(["i take wine everyday and i cant live without it"])
print(bot_response)

alcohol-addiction
Good to know that. Now I can work with you in making you better.


### Stop words

In [60]:
from nltk.corpus import stopwords

In [None]:
# Filter stop words

### Typos - Edit distance, Phonetics

In [100]:
tokens = ['problem', 'Hey', 'need', 'some', 'answers', 'help', 'me', 'with', 'this', 'Please', 'addicted', 'alcohol', 'love', 'daily', 'alcoholic', 'depressed', 'lonely', 'dont', 'friends', 'alone', 'always', 'sad', 'Why', 'all', 'time', 'Hi', 'Hey', 'there', 'Hola', 'Hi', 'How', 'are', 'you', 'doing', '?']

In [105]:
from difflib import get_close_matches
def spell_checker(token):
    # Implement this function by checking the documentation

In [111]:
corrected_word, flag = spell_checker('alcholic')
corrected_word

'alcoholic'

### Synonyms

In [114]:
from nltk.corpus import wordnet
syns = wordnet.synsets("wine")
print(syns)

[Synset('wine.n.01'), Synset('wine.n.02'), Synset('wine.v.01'), Synset('wine.v.02')]
