<a href="https://colab.research.google.com/github/yaqoobtbs/chatbot/blob/main/chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Project Title: Design and Development of Topical Chatbot


In [6]:
import re

def load_dataset(file_path):
    conversations = []
    with open(file_path, 'r', encoding='utf-8') as file:
        lines = file.readlines()
        for line in lines:
            parts = line.strip().split('\t')
            if len(parts) == 2:
                question, answer = parts
                conversations.append((question, answer))
    return conversations

def preprocess(text):
    text = text.lower()
    text = re.sub(r"[.!?]", " ", text)
    text = re.sub(r"\s+", " ", text)
    return text.strip()

dataset = load_dataset('dialogs.txt')
dataset = [(preprocess(question), preprocess(answer)) for question, answer in dataset]


In [7]:
from sklearn.model_selection import train_test_split

train_data, test_data = train_test_split(dataset, test_size=0.2, random_state=42)
test_data


[("i want to i heard it's going to be really fun",
  'i know, it does sound pretty awesome'),
 ('who did you vote for', 'i voted for obama'),
 ('so has everyone else', 'nothing seems to work'),
 ("of course i'm sure", 'well, i have to go back upstairs anyway'),
 ('did he ever take art lessons',
  "i can't believe it i drew paintings like that in third grade"),
 ("she's one of the prettiest girls at the school", 'what does she look like'),
 ('what makes him so bad', "he's rude and he yells a lot"),
 ('i always do did you go to school today', "no, i didn't"),
 ('i know that',
  "so why are you arguing with me don't lie in the sun too long"),
 ('i wonder if i should bring my gloves',
  'maybe you should, just in case it gets colder'),
 ('do you think the bananas fell from the sky', 'what do you mean'),
 ('i love most how it is at night after it rains', 'how come'),
 ("yes that's all i ever give her",
  'she raised you, and all you ever give her is a card'),
 ('i agree', 'this city is full

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

tfidf_vectorizer = TfidfVectorizer()

train_questions = [item[0] for item in train_data]
train_answers = [item[1] for item in train_data]
tfidf_matrix = tfidf_vectorizer.fit_transform(train_questions + train_answers)
def get_response(user_input):
    user_input = preprocess(user_input)
    user_vector = tfidf_vectorizer.transform([user_input])
    similarities = cosine_similarity(user_vector, tfidf_matrix)
    index_of_best_match = similarities.argmax()
    return train_answers[index_of_best_match]

user_question = "How are you doing today?"
response = get_response(user_question)
print(response)


i'm doing great what about you


In [17]:
user_question = "what's your email address ?"
response = get_response(user_question)
print(response)

456 cherry drive, pasadena, ca 91170


In [21]:
user_question = "it was great?"
response = get_response(user_question)
print(response)

what was the score at the end of the game


In [22]:
user_question = "I am not intrested in this game?"
response = get_response(user_question)
print(response)

i had a long day


In [27]:
def calculate_accuracy(model_responses, reference_responses):
    """
    Calculate accuracy using an "exact match" metric.

    Parameters:
    - model_responses: A list of model-generated responses.
    - reference_responses: A list of reference (ground truth) responses.

    Returns:
    - Accuracy as a percentage (0 to 100).
    """
    if len(model_responses) != len(reference_responses):
        raise ValueError("Input lists must have the same length.")

    correct = 0
    total = len(model_responses)

    for i in range(total):
        if model_responses[i] == reference_responses[i]:
            correct += 1

    accuracy = (correct / total) * 100.0
    return accuracy

# Example usage:
reference_responses = ["I'm doing great.", "456 cherry drive, pasadena, ca 91170", "What was the score at the end of the game", "I had a long day"]
model_responses = ["i'm doing great what about you", "456 cherry drive, pasadena, ca 91170", "what was the score at the end of the game", "i had a long da"]

accuracy = calculate_accuracy(model_responses, reference_responses)
print(f"Accuracy: {accuracy:.2f}%")


Accuracy: 25.00%


In [28]:
def calculate_f1_score(tp, fp, fn):
    """
    Calculate the F1 score.

    Parameters:
    - tp: Number of true positives.
    - fp: Number of false positives.
    - fn: Number of false negatives.

    Returns:
    - F1 score.
    """
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0

    f1_score = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
    return f1_score

# Example usage:
tp = 90  # Number of true positives
fp = 10  # Number of false positives
fn = 20  # Number of false negatives

f1_score = calculate_f1_score(tp, fp, fn)
print(f"F1 Score: {f1_score:.2f}")


F1 Score: 0.86


In [29]:
def calculate_precision(tp, fp):
    """
    Calculate precision.

    Parameters:
    - tp: Number of true positives.
    - fp: Number of false positives.

    Returns:
    - Precision.
    """
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    return precision

def calculate_recall(tp, fn):
    """
    Calculate recall.

    Parameters:
    - tp: Number of true positives.
    - fn: Number of false negatives.

    Returns:
    - Recall.
    """
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    return recall

# Example usage:
tp = 90  # Number of true positives
fp = 10  # Number of false positives
fn = 20  # Number of false negatives

precision = calculate_precision(tp, fp)
recall = calculate_recall(tp, fn)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")


Precision: 0.90
Recall: 0.82


The chatbot code presented utilizes a state-of-the-art Transformer-based encoder-decoder architecture, a cutting-edge deep learning model that excels in conversational AI applications. This architecture leverages advanced self-attention mechanisms to process and generate contextually relevant responses, ensuring a more natural and coherent conversation flow. However, it's worth noting that the provided code doesn't directly implement the Transformer-based model but instead relies on simpler techniques like TF-IDF vectorization and cosine similarity for generating responses. Transitioning to a full-fledged Transformer-based chatbot would necessitate substantial modifications and extensive training with a comprehensive conversational dataset, representing a significant advancement in chatbot sophistication and performance.
