Welcome to Lab 6! 😊 <br>
This week, we'll learn about NLP, and how to creat a chatbot!<br>
💡 Read all the instructions carefully and complete the cells ( `____` ) starting with the question. Compare your answer with the answer given below the cell.


# Building a simple Chatbot
Chatbots can combine complex processes to streamline and automate common and repetitive tasks through a few simple voice or text requests, reducing execution time and improving business efficiencies.

Let's create a very basic chatbot utlising the Python's spaCy library. It's a very simple bot with hardly any cognitive skills, but still a good way to get into NLP and get to know about chatbots. 😀

First of all, install and import required libraries (you may need to install other libararies).

In [None]:
!pip install spacy
!pip install beautifulsoup4
!pip install experta

In [None]:
import random
import json
import spacy
import difflib
from difflib import get_close_matches, SequenceMatcher
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

from random import choice
from experta import *

In [None]:
# You may need to restart the kernel after running these two commands for it to work.
import spacy.cli
spacy.cli.download("en_core_web_sm")

In [None]:
# If you are using Colab, you should upload data folder and use its path here
intentions_path = "data/intentions.json"
sentences_path = "data/sentences.txt"

# NLP
NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way. By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.

Sometimes identifying a subject is as simple as checking keywords, and sometimes it requires complex analysis. Let's start with keywords.

## Generating Responses by Keyword Matching

There are different solutions for matching keywords that can be used depending on the complexity of the query. Here we use a simple keyword matching for greetings. We can also use `regular expression (regex)` or `spaCy Matcher` for keyword matching.


💡 In the first step, the goal is to be able to recognize a keyword related to greeting, goodbye or thanks, then show a related response to the user.

Before [loading JSON file in python](https://docs.python.org/3/library/json.html) check `intentions.json` file.

In [None]:
# Opening JSON file and return JSON object as a dictionary
with open(intentions_path) as f:
    intentions = json.load(f)

print(json.dumps(intentions, indent=4))

❓ <font color='red'>Question: </font>  Complete and run the following cell.<br>(Hint: Each sentence must be `split` into words and then if the word matches one of the `patterns`, the corresponding `responses` will be displayed.) 

In [None]:
final_chatbot = False

def check_intention_by_keyword(sentence):
    for word in sentence._____():
        for type_of_intention in intentions:
            if word.lower() in intentions[type_of_intention]["_____"]:
                print("BOT: " + random.choice(intentions[type_of_intention]["_____"]))
                
                # Do not change these lines
                if type_of_intention == 'greeting' and final_chatbot:
                    print("BOT: We can talk about the time, date, and train tickets.\n(Hint: What time is it?)")
                return type_of_intention
    return None

sample_user_input = "Hi there."
print(sample_user_input)
print(f'Detected intention: {check_intention_by_keyword(sample_user_input)}')
print('*' * 50)
sample_user_input = "Thank you very much!"
print(sample_user_input)
print(f'Detected intention: {check_intention_by_keyword(sample_user_input)}')

<font color='blue'>Your output should be (some chatbot responses may be random):</font>
<br>
```
Hi there.
BOT: Thanks for checking in.
Detected intention: greeting
**************************************************
Thank you very much!
BOT: You're welcome
Detected intention: thanks
```

## Generating Responses by Similarity of Text

The main issue with text data is that it is all in text format (strings). However, the Machine learning algorithms need some sort of numerical feature vector in order to perform the task. You can find more information in this [link](https://medium.com/@adriensieg/text-similarities-da019229c894), but we use a spaCy function for this purpose.

Suppose our chatbot wants to answer a user's question about today's time or date. We use several sample sentences for these questions that we can identify the user's intention by measuring the similarity of the user's question with these sentences. First, we need to read this sentences from `sentences.txt` file.

In [None]:
# Reading `sentences.txt` file and printing its content.
time_sentences = ''
date_sentences = ''
with open(sentences_path) as file:
    for line in file:
        parts = line.split(' | ')
        if parts[0] == 'time':
            time_sentences = time_sentences + ' ' + parts[1].strip()
        elif parts[0] == 'date':
            date_sentences = date_sentences + ' ' + parts[1].strip()

print(time_sentences)
print('*' * 50)
print(date_sentences)

Then we need to match each sentence to its label (‍`time` or `date`) to use them in the next steps.

In [None]:
nlp = spacy.load('en_core_web_sm')

labels = []
sentences = []

doc = nlp(time_sentences)
for sentence in doc.sents:
    labels.append("time")
    sentences.append(sentence.text.lower().strip())

doc = nlp(date_sentences)
for sentence in doc.sents:
    labels.append("date")
    sentences.append(sentence.text.lower().strip())

for lable, sentence in zip(labels, sentences):
     print(lable + " : " + sentence)

❓ <font color='red'>Question: </font>  Complete and run the following cell. We need a function to lemmatize the text and remove stop words and punctuations, and then return the cleaned text.

In [None]:
def lemmatize_and_clean(text):
    doc = nlp(text.lower())
    out = ""
    for token in doc:
        if not token._____ and not token.______:
            out = out + token._____ + " "
    return out.strip()

sample_user_input = "Tell me the time!"
lemmatize_and_clean(sample_user_input)

<font color='blue'>Your output should be (some chatbot responses may be random):</font>
<br><br>
'tell time'

❓ <font color='red'>Question: </font>  Then we need a function that measures the similarity of the user's input with chatbot data (loaded sentences from previous cells). You should use the previous function in the upcoming function to clean the texts and then measure their similarity. For more information about similarity measurement, you can use [this link](https://spacy.io/usage/linguistic-features#vectors-similarity).<br>
Complete and run the following cell. 

In [None]:
final_chatbot = False

def date_time_response(user_input):
    cleaned_user_input = _____
    doc_1 = nlp(cleaned_user_input)
    similarities = {}
    for idx, sentence in enumerate(sentences):
        cleaned_sentence = _____
        doc_2 = nlp(cleaned_sentence)
        similarity = _____
        similarities[idx] = similarity

    max_similarity_idx = max(similarities, key=similarities.get)
    
    # Minimum acceptable similarity between user's input and our Chatbot data
    # This number can be changed
    min_similarity = 0.75

    # Do not change these lines
    if similarities[max_similarity_idx] > min_similarity:
        if labels[max_similarity_idx] == 'time':
            print("BOT: " + "It’s " + str(datetime.now().strftime('%H:%M:%S')))
            if final_chatbot:
                print("BOT: You can also ask me what the date is today. (Hint: What is the date today?)")
        elif labels[max_similarity_idx] == 'date':
            print("BOT: " + "It’s " + str(datetime.now().strftime('%Y-%m-%d')))
            if final_chatbot:
                print("BOT: Now can you tell me where you want to go? (Hints: you can type in a city's name, or an organisation. I am going to London or I want to visit the University of East Anglia.)")
        return True
    
    return False

sample_user_input = "Tell me the time!"
print(sample_user_input)
date_time_response(sample_user_input)

<font color='blue'>Your output should be (some chatbot responses may be random):</font>
<br>
```
Tell me the time!
BOT: It’s 13:18:39
```

# Generating Responses by Named-entity recognition (NER)

Suppose our chatbot has a feature that can recognize the city/country and tell the user its weather, or if the user asks for the name of any university in the UK (organization), give its location and ranking.

(For this, an API is needed, but here we use some fixed sentences for all the responses, and only identifying the location or organization is important.)

Let's scrape some data about universities and their ranking in the UK (from https://www.4icu.org/gb/a-z/).

Web scraping is the process of gathering information from the Internet. [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily.

In [None]:
url = "https://www.4icu.org/gb/a-z/"
content = requests.get(url)
soup = BeautifulSoup(content.text, 'html.parser')

universities = {}
for tr in soup.find_all('tr')[2:]:
    tds = tr.find_all('td')
    if len(tds) == 3:
        university = {}
        university['Rank'] = tds[0].text
        university['Name'] = tds[1].text
        university['City'] = tds[2].text.replace(' ...', '')
        universities[tds[1].text] = university
        print ("%s, Rank: %s, City: %s" % \
            (university['Name'], university['Rank'], university['City']))


This time, we use another library called [difflib](https://docs.python.org/3/library/difflib.html) to check the similarity of the user's input text with chatbot data (here the names of universities). As mentioned earlier, there are many ways to perform such operations, and the purpose of using multiple methods is only to show different approaches to solving the problem.

❓ <font color='red'>Question: </font>  Complete and run the following cell.<br>
Check [this link](https://docs.python.org/3/library/difflib.html) and use `get_close_matches` and `SequenceMatcher` functions to complete this code.

In [None]:
def get_best_match_university(user_input):
  university_list = universities.keys()
  matches = _____
  if len(matches) > 0:
    best_match = matches[0]
  else:
    return None
  
  sm = _____
  score = sm.ratio()
  if score >= 0.6:
    return best_match
  else:
    return None

matched_university = get_best_match_university("I am studying at University of Manchester.")
print(universities[matched_university])
print('*' * 50)
# Identifying the university despite the misspelling in its name 
# (Univerity --> University & Angla --> Anglia)
matched_university = get_best_match_university("I am studying at Univerity of East Angla.")
print(universities[matched_university])

<font color='blue'>Your output should be (some chatbot responses may be random):</font>
<br>
```
{'Rank': '6', 'Name': 'The University of Manchester', 'City': 'Manchester'}
**************************************************
{'Rank': '27', 'Name': 'University of East Anglia', 'City': 'Norwich'}
```

❓ <font color='red'>Question: </font>  Complete and run the following cell.<br>
The purpose of this function is to determine the type of request by using Named-entity recognition (NER). You can complete this code with the help of Part A codes and [this link](https://v2.spacy.io/api/annotation#named-entities).

In [None]:
final_chatbot = False

def ner_response(user_input):
    doc = nlp(user_input)
    
    for ent in doc._____:
        # use entity type for countries, cities, states.
        if ent._____ == "_____":
          # Do not change these lines
          print("BOT: Weather is very " + random.choice(["windy ", "cold ", "warm "]) + "in " + ent.text + " today.")
          if final_chatbot:
            print("BOT: Could you please tell me what kind of ticket you are looking for? (You can just ask for one way, round and open return tickets.)")
          return True
    
    for ent in doc._____:
        # use entity type for companies, agencies, institutions, etc.
        if ent._____ == "_____":
          # Do not change these lines
          matched_university = get_best_match_university(ent.text)
          if matched_university != None:
            university = universities[matched_university]
            print(f"BOT: You asked for {university['Name']} which is located in the {university['City']} and its rank is {university['Rank']} in the UK.")
            if final_chatbot:
              print(f"BOT: Could you please tell me what kind of ticket you are looking for? (You can just ask for one way, round and open return tickets.)")
          else:
            print("BOT: I don't have any information about it.")

          return True
    
    return False


sample_user_input = "How is weather in London?"
print(sample_user_input)
ner_response(sample_user_input)
print('*' * 50)
sample_user_input = "Give me information about University of East Anglia."
print(sample_user_input)
ner_response(sample_user_input)

<font color='blue'>Your output should be (some chatbot responses may be random):</font>
<br>
```
How is weather in London?
BOT: Weather is very windy in London today.
**************************************************
Give me information about University of East Anglia.
BOT: You asked for University of East Anglia which is located in the Norwich and its rank is 27 in the UK.
```

# Expert System
The aim of the expert system is to take knowledge from a human expert and convert this into a number of hardcoded rules to apply to the input data.

A classic example of a rule-based system is the domain-specific expert system that uses rules to make deductions or choices. For example, an expert system might help a doctor choose the correct diagnosis based on a cluster of symptoms, or select tactical moves to play a game.

**The Basics**

An expert system is a program capable of pairing up a set of facts with a set of rules to those facts, and execute some actions based on the matching rules.

**Facts**

Facts are the basic unit of information of Experta. They are used by the system to reason about the problem.

**Rules**

In their most basic form, the rules are commonly conditional statements (if a, then do x, else if b, then do y).

**Experta** 

Experta is a Python library for building expert systems strongly inspired by CLIPS ([Experta Documentation](https://readthedocs.org/projects/experta/downloads/pdf/stable/)).


The goal is to recognize the type of train ticket (according to the user's input message) with the Expert System and give a suitable answer to the user according to it. You might say that this operation could also be implemented with a simple if else condition. True, but it should be kept in mind that this is just for introduction and getting started, and for a bigger project (like your coursework) you will understand the importance of Expert System.

In [None]:
final_chatbot = False

class Book(Fact):
    """Info about the booking ticket."""
    pass

class TrainBot(KnowledgeEngine):
  @Rule(Book(ticket='one way'))
  def one_way(self):
    print("BOT: You have selected a one way ticket. Have a good trip.")
    if final_chatbot:
      print("BOT: If you don't have any other questions you can type bye.")

  @Rule(Book(ticket='round'))
  def round_way(self):
    print("BOT: You have selected a round ticket. Have a good trip.")
    if final_chatbot:
      print("BOT: If you don't have any other questions you can type bye.")

  @Rule(AS.ticket << Book(ticket=L('open ticket') | L('open return')))
  def open_ticket(self, ticket):
    print("BOT: You have selected a " + ticket["ticket"] +".  Have a good trip.")
    if final_chatbot:
      print("BOT: If you don't have any other questions you can type bye.")

In [None]:
engine = TrainBot()
engine.reset()
engine.declare(Book(ticket=choice(['one way', 'round', 'open ticket', 'open return'])))
engine.run()

❓ <font color='red'>Question: </font>  Complete and run the following cell.

In [None]:
def check_ticket(user_input):
  user_input = user_input.lower()
  ticket_list = ['one way', 'round', 'open ticket', 'open return']    
  
  for ticket in ticket_list:
    if ticket in user_input:
      return ticket_list[ticket_list.index(ticket)]
  
  return None
  

def expert_response(user_input):
    engine = TrainBot()
    engine._____
    ticket = check_ticket(user_input)
    if ticket != None:
        engine._____
        engine._____
        return True
    
    return False

sample_user_input = "I want a one way ticket."
print(sample_user_input)
expert_response(sample_user_input)

<font color='blue'>Your output should be (some chatbot responses may be random):</font>
<br>
```
I want a one way ticket.
BOT: You have selected a one way ticket. Have a good trip.
```

# Chatbot

Finally, we will feed all the lines that we want our bot to say while making a conversation depending upon user’s input.

This is a simple chatbot and only recognizes a few simple dialogues. You can use queries like the following for conversation (the answers related to weather are fake).


*   Hi there!
*   Tell me the time
*   Tell me the date
*   What is the weather like in Norwich?
*   Where is the University of East Anglia?



In [None]:
final_chatbot = True
flag=True
print("BOT: Hi there! How can I help you?.\n (If you want to exit, just type bye!)") 
while(flag==True):
    user_input = input()
    intention = check_intention_by_keyword(user_input)
    if intention == 'goodbye':
        flag=False
    elif intention == None:
        if not ner_response(user_input):
          if not date_time_response(user_input):
              if not expert_response(user_input):
                if not ner_response(user_input):
                  print("BOT: Sorry I don't understand that. Please rephrase your statement.")