# Project 2: Making Chatbots!

## Part 1: Rule Based Chatbot (ELIZA)
In this secttion, you will implement a rule-based Chatbot, ELIZA, using the provided eliza.py file. The eliza.py file contains the rules for the model to follow, you need to complete the code to utilize the file to implement a chat agent while saving the chat history.

In [12]:
import eliza

### Load the Eliza model from eliza.py

In [13]:
#eliza = eliza.FILL_IN_CODE
eliza = eliza.Eliza()

### Define and open the file to save chat history

In [24]:
from datetime import datetime

# Get the current date and time
#now = FILL_IN_CODE
now = datetime.now()

# Format the date and time to create the file name
file_name = f"ELIZA_CHAT_{now.strftime('%Y_%m_%d_%H_%M_%S')}.txt"
f = open(file_name, "w")

### Define the regex for exit
You need to define an appropriate regex expression that searches for the appropriate words such as 'bye' or 'exit' in the user input to recognize that the user wants to end the chat

In [25]:
import re

def is_end(input_string):
    exit_list = ['bye', 'exit', 'see you', 'chao', 'exit']
    #pattern = re.compile(FILL_IN_CODE) # compile and find exit words
    pattern = re.compile('|'.join(exit_list), re.IGNORECASE)
    #if FILL_IN_CODE:
    if pattern.search(input_string):
        return True
    return False

In [26]:
# testing code
exit_list = ['bye', 'exit', 'see you', 'chao', 'exit']
pattern = re.compile('|'.join(exit_list), re.IGNORECASE)
#pattern = re.compile("G", re.IGNORECASE)
if pattern.search("bye"):
    print("Found")
else:
    print("Not found")

Found


### Define a chat loop
In this part, you need to define a loop that asks for user input and prints the user response till the user wants to end the chat

In [27]:
while True:
    user_input = input("User:")
    if is_end(user_input):
        # Output the goodbye message, write the last inputs and outputs to the log and close the file
        goodbye_message = "Goodbye, please take care"
        print("Eliza:", goodbye_message)
        f.write("User: " + user_input + "\n") # Log the user's last input
        f.write("Eliza: "+ goodbye_message + "\n") # Log the agent's goodbye message
        f.close()
        break
    response = eliza.respond(user_input)
    print("Eliza:", response)
    f.write("User: " + user_input + "\n") # Log the user's input
    f.write("Eliza: "+ response + "\n") # Log the agent's response

Eliza: Can you elaborate on that?
Eliza: I see.
Eliza: What other reasons come to mind?
Eliza: Why do you say that you also like the taste?
Eliza: Is that the real reason?
Eliza: OK, but can you elaborate a bit?
Eliza: How do you feel when you say that?
Eliza: When you feel warm, what do you do?
Eliza: Can you elaborate on that?
Eliza: Goodbye, please take care


Congrats, you are done with part 1. You now need to simply test out your model for 3 chat conversations (minimum 10 utterances in each conversation) and report the results of the human survey.

## Part 2: Corpus Based Chatbot

In this section, you will implement a corpus-based chatbot using the given dialogues.csv corpus. As a part of this task, you will first load the dataset, compute the sentence embeddings for the corpus sentences using the SentenceTransformer Library and then utilize these embeddings for retrieving the most appropriate response.

Note: This part will be slow to run on a CPU based environment (upto 5 minutes), however, it should be very fast on a Colab GPU environment (close to 5 seconds), because of the use of transformer architectures.

In [13]:
!pip install -U sentence-transformers


Collecting sentence-transformers
  Downloading sentence_transformers-4.0.2-py3-none-any.whl.metadata (13 kB)
Downloading sentence_transformers-4.0.2-py3-none-any.whl (340 kB)
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-4.0.2


In [28]:
import pandas as pd
from sentence_transformers import SentenceTransformer, util
import numpy as np


### Load the dataset
Load the dialogues.csv file using the pandas library.

In [29]:
data = pd.read_csv("dialogues.csv")
data.head()

Unnamed: 0,emotion,User,Agent
0,sentimental,I remember going to see the fireworks with my ...,"Was this a friend you were in love with, or ju..."
1,sentimental,This was a best friend. I miss her.,Where has she gone?
2,sentimental,We no longer talk.,Oh was this something that happened because of...
3,sentimental,"Was this a friend you were in love with, or ju...",This was a best friend. I miss her.
4,sentimental,Where has she gone?,We no longer talk.


### Load the SentenceTransformer model
docs: https://sbert.net/docs/sentence_transformer/usage/usage.html

Load the ```all-MiniLM-L6-v2``` sentence transformer model for computing the contextual embeddings.

In [30]:
#model = FILL_IN_CODE
model = SentenceTransformer("all-MiniLM-L6-v2")

### Compute the sentence embeddings
For the 'User' column of the dataset, compute the sentence embeddings using the sentence transformer model.

In [31]:
user_dialogues = data['User'].tolist() # sentences to encode
#user_embeddings = model.FILL_IN_CODE
user_embeddings = model.encode(user_dialogues) # calculate embeddings

The above line took about 30s run on Mac M4.

### Retrieve the agent response
In the get_response() function, utilize the user_embeddings to retrieve the most similar instance from the data point using cosine similarity. For the selected data point, return the corresponding response in the 'Agent' column of the data as the agent's reponse.

In [32]:
def get_response(user_input, data, model, user_embeddings):
    # Convert the input of the user to its sentence embedding
    input_embedding = model.encode(user_input)
    
    # Compute cosine similarities between user input embeddings and user embeddings (based on the data)
    cosine_scores = util.pytorch_cos_sim(input_embedding, user_embeddings)
    
    # Find the index of the highest cosine similarity using np.argmax.
    best_match_idx = np.argmax(cosine_scores.numpy())
    
    # Return the corresponding string for the 'Agent' column
    return data['Agent'].iloc[best_match_idx]


### Define and open the file to save chat history

In [35]:
### Define and open the file to save chat history
from datetime import datetime

# Get the current date and time
now = datetime.now()

# Format the date and time to create the file name
file_name = f"CORPUS_CHAT_{now.strftime('%Y_%m_%d_%H_%M_%S')}.txt"
f = open(file_name, "w")

### Define a chat loop
In this part, you need to define a loop that asks for user input and prints the user response till the user wants to end the chat. Utilize the same regex expression as before to identify when the user wants to end the chat.

In [36]:
while True:
    user_input = input("User:")
    if is_end(user_input):
        # Output the goodbye message, write the last inputs and outputs to the log and close the file
        goodbye_message = "Goodbye, please take care"
        print("Corpus Agent:", goodbye_message)
        f.write("User:" + user_input + "\n") # Log the user's last input
        f.write("Corpus Agent:" + goodbye_message + "\n") # Log the agent's goodbye message
        f.close()
        break
    response = get_response(user_input, data, model, user_embeddings)
    print("Corpus Agent:", response)
    f.write("User: " + user_input + "\n") # Log the user's input
    f.write("Corpus Agent: "+ response + "\n") # Log the agent's response

Corpus Agent: You would have been mad at me this morning!
Corpus Agent: I am happy for you
Corpus Agent: A large caramel frappe with whip cream. It's delicious.
Corpus Agent: Are the coffees there expensive?
Corpus Agent: What kind of coffee will you get?
Corpus Agent: A large caramel frappe with whip cream. It's delicious.
Corpus Agent: Goodbye, please take care


Congrats, you are done with part 2. You now need to simply test out your model for 3 chat conversations (minimum 10 utterances in each conversation) and report the results of the human survey.