**CS Interview QA Chatbot Usage**

This workbook contains the code for training and running the GPT-2 Transformer model that answers question about CS interview questions.

Please ensure that you have ran the preprocessing.ipynb notebook before running this, and store it in your google drive
- ensure that you have combined.csv stored in your directory

To train from scratch:
1. [Import libraries](#scrollTo=ZNjjbPXpWiqa&line=1&uniqifier=1)
2. [Load the preprocessed dataset](#scrollTo=ErDcd3pqnL5N&line=1&uniqifier=1)
3. [Run Utility Functions](#scrollTo=-NASuplC3w-N)
4. [Define Model Architecture](#scrollTo=3a-zptK6DXDJ)
5. [Define Training Function](#scrollTo=SFjeoyzLfRmz)
6. [Initialise and Train Model](#scrollTo=3HQZcsVhKU8I)
7. [Run evaluation functions](#scrollTo=03W05HyA5Sxh)
    - Fallback Mechanism with BERTScore
8. [Evaluate with BERT, ROGUE, BLEU Score (quantitative)](#scrollTo=-ABw4DJ089aC&line=3&uniqifier=1)
9. [Host on Telegram](#scrollTo=-ABw4DJ089aC&line=3&uniqifier=1)

### Imports

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
FOLDER_PATH = '/content/drive/My Drive/Colab Notebooks/nlc/project'

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import torch
from torch.jit import script, trace
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
import csv
import random
import re
import os
import unicodedata
import codecs
from io import open
import itertools
import math


USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda" if USE_CUDA else "cpu")

### Loading Datasets

In [None]:
import pandas as pd

# read from csv
df = pd.read_csv(os.path.join(FOLDER_PATH, 'combined_data.csv'))
df

Unnamed: 0,Question,Answer
0,how does randomised algorithm work,the algorithm typically uses uniformly random ...
1,what do you mean by bestfirst search,bestfirst search is a search algorithm which e...
2,how do you explain a daemon,daemon disk and execution monitor is a process...
3,what is phonetic algorithm,a phonetic algorithm is an algorithm for index...
4,what do you mean by uniform costsearch,a tree search that finds the lowestcost route ...
...,...,...
3770,explain biasvariance tradeoff,biasvariance tradeoff is a concept in machine ...
3771,what is stochastic gradient descent sgd in mac...,stochastic gradient descent sgd is an optimiza...
3772,explain stochastic gradient descent,stochastic gradient descent sgd is an optimiza...
3773,what is the backpropagation algorithm in machi...,the backpropagation algorithm is a widely used...


In [None]:
# convert the df to 2d list
qa_pairs = df.values.tolist()
len(qa_pairs), qa_pairs[0]

(3775,
 ['how does randomised algorithm work',
  'the algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior in the hope of achieving good performance in the average case over all possible choices of random bits'])

### Functions

In [None]:
# Default word tokens
PAD_token = 0  # Used for padding short sentences
SOS_token = 1  # Start-of-sentence token
EOS_token = 2  # End-of-sentence token
UNK_token = 3  # Unknown word token

class Voc:
    def __init__(self, name):
        self.name = name
        self.trimmed = False
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS", UNK_token: "UNK"}
        self.num_words = 4  # Count SOS, EOS, PAD, UNK

    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.num_words
            self.word2count[word] = 1
            self.index2word[self.num_words] = word
            self.num_words += 1
        else:
            self.word2count[word] += 1

    # Remove words below a certain count threshold
    def trim(self, min_count):
        if self.trimmed:
            return
        self.trimmed = True

        keep_words = []

        for k, v in self.word2count.items():
            if v >= min_count:
                keep_words.append(k)

        print('keep_words {} / {} = {:.4f}'.format(
            len(keep_words), len(self.word2index), len(keep_words) / len(self.word2index)
        ))

        # Reinitialize dictionaries
        self.word2index = {}
        self.word2count = {}
        self.index2word = {PAD_token: "PAD", SOS_token: "SOS", EOS_token: "EOS", UNK_token: "UNK"}
        self.num_words = 4 # Count default tokens

        for word in keep_words:
            self.addWord(word)

In [None]:
# Turn a Unicode string to plain ASCII, thanks to
# http://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

# Lowercase, trim, and remove non-letter characters
def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    s = re.sub(r"\s+", r" ", s).strip()
    return s

# Returns True iff both sentences in a pair 'p' are under the MAX_LENGTH threshold
def filterPair(p, max_length):
    # Input sequences need to preserve the last word for EOS token
    return len(p[0].split(' ')) < max_length and len(p[1].split(' ')) < max_length

# Filter pairs using filterPair condition
def filterPairs(pairs, max_length):
    return [pair for pair in pairs if filterPair(pair, max_length)]


In [None]:
voc = Voc("qa")
MAX_LENGTH = 50  # Maximum sentence length to consider

# filter pairs based on MAX_LENGTH
print(f'before filtering, no. of qa pairs: {len(qa_pairs)}')
qa_pairs = filterPairs(qa_pairs, MAX_LENGTH)
print(f'after filtering, no. of qa pairs: {len(qa_pairs)}')

before filtering, no. of qa pairs: 3775
after filtering, no. of qa pairs: 3145


In [None]:
# normalise each qa pair and add word from question and answer into vocab
for i in range(len(qa_pairs)):
    question, answer = qa_pairs[i]
    question = normalizeString(question)
    answer = normalizeString(answer)
    voc.addSentence(question)
    voc.addSentence(answer)
    qa_pairs[i] = [question, answer]

In [None]:
vocab_size = voc.num_words # words + 3 tokens (PAD, SOS, EOS)
vocab_size

4980

In [None]:
qa_pairs[:5]

[['how does randomised algorithm work',
  'the algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior in the hope of achieving good performance in the average case over all possible choices of random bits'],
 ['what do you mean by bestfirst search',
  'bestfirst search is a search algorithm which explores a graph by expanding the most promising node chosen according to a specified rule'],
 ['how do you explain a daemon',
  'daemon disk and execution monitor is a process that runs in the background without users interaction they usually start at the booting time and terminate when the system is shut down'],
 ['what is phonetic algorithm',
  'a phonetic algorithm is an algorithm for indexing of words by their pronunciation'],
 ['what do you mean by uniform costsearch',
  'a tree search that finds the lowestcost route where costs vary']]

In [None]:
questions = [pair[0] for pair in qa_pairs]
answers = [pair[1] for pair in qa_pairs]
len(questions), len(answers)

(3145, 3145)

## Transformer Model Architecture

In [None]:
!pip install transformers



In [None]:
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch

class GPT2QA:
    """GPT-2 Model for Question Answering."""

    def __init__(self, model_name="gpt2"):
        self.tokenizer = GPT2TokenizerFast.from_pretrained(model_name)
        self.model = GPT2LMHeadModel.from_pretrained(model_name)
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model.to(self.device)

    def generate_response(model, input_text):
        """Generate a response using the GPT-2 model with appropriate configurations."""
        inputs = model.tokenizer.encode(input_text, return_tensors="pt").to(model.device)

        # Generate response with the appropriate configurations
        outputs = model.model.generate(
            inputs,
            max_length=150,
            do_sample=True,  # Enable sampling for using top_p
            top_k=50,
            top_p=0.95,
            num_beams=2,  # Use beam search to leverage early stopping
            early_stopping=True,
            pad_token_id=model.tokenizer.eos_token_id  # Set padding token id
        )

        response = model.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return response.strip()

### Training

Hyperparameters were adjusted based on computational resource limits.

In [None]:
import pandas as pd
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
import time

class QADataset(Dataset):
    """Dataset class for question-answer pairs."""

    def __init__(self, qa_pairs, tokenizer, max_length=50):
        self.qa_pairs = qa_pairs
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.qa_pairs)

    def __getitem__(self, idx):
        question, answer = self.qa_pairs[idx]
        inputs = self.tokenizer.encode_plus(
            question,
            answer,
            add_special_tokens=True,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        return {
            'input_ids': inputs['input_ids'].squeeze(),
            'attention_mask': inputs['attention_mask'].squeeze(),
            'labels': inputs['input_ids'].squeeze()  # For supervised training, labels are the same as input_ids
        }

def train_model(qa_pairs, model, batch_size=8, num_epochs=3):
    """Train the GPT-2 model on the question-answer pairs."""

    tokenizer = model.tokenizer
    dataset = QADataset(qa_pairs, tokenizer)
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

    optimizer = AdamW(model.model.parameters(), lr=5e-5)

    model.model.train()
    for epoch in range(num_epochs):
        total_loss = 0
        for batch in dataloader:
            optimizer.zero_grad()

            input_ids = batch['input_ids'].to(model.device)
            attention_mask = batch['attention_mask'].to(model.device)
            labels = batch['labels'].to(model.device)

            # Start timing for total training time
            start_time = time.time()

            # Forward pass
            outputs = model.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss

            # Backward pass
            loss.backward()
            optimizer.step()

            total_loss += loss.item()

            # End timing for current batch
            end_time = time.time()
            print(f"Time taken for batch: {end_time - start_time:.4f} seconds")

        # Average loss for the epoch
        avg_loss = total_loss / len(dataloader)
        print(f"Epoch {epoch + 1}/{num_epochs}, Average Loss: {avg_loss:.4f}")

    # Save the fine-tuned model
    model.model.save_pretrained("fine_tuned_gpt2")
    model.tokenizer.save_pretrained("fine_tuned_gpt2")

In [None]:
import pandas as pd
from torch.utils.data import DataLoader
from torch.optim import AdamW
import time

def load_data(file_path):
    """Load and preprocess the QA dataset."""
    df = pd.read_csv(file_path)
    qa_pairs = df.values.tolist()
    # Normalize each pair
    qa_pairs = [(q.lower().strip(), a.lower().strip()) for q, a in qa_pairs]
    return qa_pairs

def main():
    # Load the QA data
    file_path = '/content/drive/My Drive/Colab Notebooks/nlc/project/combined_data.csv'
    qa_pairs = load_data(file_path)

    # Initialize the model
    gpt2_qa_model = GPT2QA()

    # Add a padding token if not already set
    if gpt2_qa_model.tokenizer.pad_token is None:
        gpt2_qa_model.tokenizer.add_special_tokens({'pad_token': '[PAD]'})
        gpt2_qa_model.model.resize_token_embeddings(len(gpt2_qa_model.tokenizer))

    # Create DataLoader
    dataset = QADataset(qa_pairs, gpt2_qa_model.tokenizer)
    dataloader = DataLoader(dataset, batch_size=8, shuffle=True)

    # Set up the optimizer
    optimizer = AdamW(gpt2_qa_model.model.parameters(), lr=5e-5)

    # Training loop
    gpt2_qa_model.model.train()  # Ensure model is in training mode
    num_epochs = 3

    for epoch in range(num_epochs):
        total_loss = 0
        start_epoch_time = time.time()  # Track epoch time

        for batch in dataloader:
            optimizer.zero_grad()  # Reset gradients

            input_ids = batch['input_ids'].to(gpt2_qa_model.device)
            attention_mask = batch['attention_mask'].to(gpt2_qa_model.device)
            labels = batch['labels'].to(gpt2_qa_model.device)

            # Start timing for total training time
            start_time = time.time()

            # Forward pass
            outputs = gpt2_qa_model.model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss

            # Backward pass
            loss.backward()
            optimizer.step()

            total_loss += loss.item()

            # End timing for current batch
            end_time = time.time()
            print(f"Time taken for batch: {end_time - start_time:.4f} seconds")

        # Average loss for the epoch
        avg_loss = total_loss / len(dataloader)
        print(f"Epoch {epoch + 1}/{num_epochs}, Average Loss: {avg_loss:.4f}")

    # Save the fine-tuned model
    gpt2_qa_model.model.save_pretrained("fine_tuned_gpt2")
    gpt2_qa_model.tokenizer.save_pretrained("fine_tuned_gpt2")

if __name__ == "__main__":
    main()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


Time taken for batch: 1.4572 seconds
Time taken for batch: 0.1630 seconds
Time taken for batch: 0.1539 seconds
Time taken for batch: 0.1367 seconds
Time taken for batch: 0.1358 seconds
Time taken for batch: 0.1297 seconds
Time taken for batch: 0.1373 seconds
Time taken for batch: 0.1403 seconds
Time taken for batch: 0.1371 seconds
Time taken for batch: 0.1360 seconds
Time taken for batch: 0.1377 seconds
Time taken for batch: 0.1368 seconds
Time taken for batch: 0.1358 seconds
Time taken for batch: 0.1359 seconds
Time taken for batch: 0.1366 seconds
Time taken for batch: 0.1377 seconds
Time taken for batch: 0.1373 seconds
Time taken for batch: 0.1372 seconds
Time taken for batch: 0.1344 seconds
Time taken for batch: 0.1365 seconds
Time taken for batch: 0.1355 seconds
Time taken for batch: 0.1357 seconds
Time taken for batch: 0.1346 seconds
Time taken for batch: 0.1392 seconds
Time taken for batch: 0.1387 seconds
Time taken for batch: 0.1355 seconds
Time taken for batch: 0.1376 seconds
T

### Fine-tuning

A custom dataset class, CustomQADataset, prepares the input sequences by tokenizing and encoding the combined questions and answers, ensuring compatibility with GPT-2’s requirements. The fine_tune_gpt2 function manages the model training process, utilising an AdamW optimizer to minimise loss over a specified number of epochs. After training, the fine-tuned model and tokenizer are saved locally and to Google Drive, allowing for future deployment in answering user queries.

In [None]:
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import GPT2LMHeadModel, GPT2TokenizerFast, AdamW

class CustomQADataset(Dataset):
    def __init__(self, qa_pairs, tokenizer, max_length=100):
        self.qa_pairs = qa_pairs
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.qa_pairs)

    def __getitem__(self, idx):
        question, answer = self.qa_pairs[idx]
        text = f"Question: {question} Answer: {answer}"
        inputs = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )
        return {
            'input_ids': inputs['input_ids'].squeeze(),
            'attention_mask': inputs['attention_mask'].squeeze(),
            'labels': inputs['input_ids'].squeeze()
        }

def fine_tune_gpt2(qa_pairs, model_name="gpt2", num_epochs=3, batch_size=8, learning_rate=5e-5):
    tokenizer = GPT2TokenizerFast.from_pretrained(model_name)
    model = GPT2LMHeadModel.from_pretrained(model_name)

    # Set the padding token
    tokenizer.pad_token = tokenizer.eos_token

    # Save paths in Google Drive
    save_path_pretrained = '/content/drive/MyDrive/Colab Notebooks/nlc/project/models/pretrained_gpt2'
    save_path_finetuned = '/content/drive/MyDrive/Colab Notebooks/nlc/project/models/fine_tuned_gpt2'
    # Create directories if they don't exist
    os.makedirs(save_path_pretrained, exist_ok=True)
    os.makedirs(save_path_finetuned, exist_ok=True)

    # Save the pre-trained model to Google Drive
    model.save_pretrained(save_path_pretrained)
    tokenizer.save_pretrained(save_path_pretrained)

    # Prepare dataset and dataloader
    dataset = CustomQADataset(qa_pairs, tokenizer)
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    # Set up the optimizer
    optimizer = AdamW(model.parameters(), lr=learning_rate)

    # Training loop
    model.train()
    for epoch in range(num_epochs):
        total_loss = 0

        for batch in dataloader:
            optimizer.zero_grad()

            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss

            loss.backward()
            optimizer.step()

            total_loss += loss.item()

        avg_loss = total_loss / len(dataloader)
        print(f"Epoch {epoch + 1}/{num_epochs}, Average Loss: {avg_loss:.4f}")

    # Save the fine-tuned model
    model.save_pretrained("fine_tuned_gpt2")
    tokenizer.save_pretrained("fine_tuned_gpt2")

    # Save the fine-tuned model to Google Drive
    model.save_pretrained(save_path_finetuned)
    tokenizer.save_pretrained(save_path_finetuned)

if __name__ == "__main__":
    fine_tune_gpt2(qa_pairs)



Epoch 1/3, Average Loss: 1.2321
Epoch 2/3, Average Loss: 0.8862
Epoch 3/3, Average Loss: 0.6758


In [None]:
from transformers import GPT2LMHeadModel, GPT2TokenizerFast

def load_fine_tuned_model(model_dir="fine_tuned_gpt2"):
    """Load the fine-tuned model and tokenizer."""
    tokenizer = GPT2TokenizerFast.from_pretrained(model_dir)
    model = GPT2LMHeadModel.from_pretrained(model_dir)
    model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))

    return model, tokenizer

def generate_response(model, tokenizer, input_text):
    """Generate a response using the fine-tuned GPT-2 model."""
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # Alter the input to guide the model toward a more appropriate response
    prompt = f"Q: {input_text}\nA:"
    inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)

    outputs = model.generate(
        inputs,
        max_length=150,
        do_sample=True,
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Process the response to remove any leading prompts, if necessary
    response = response.split("A:", 1)[-1].strip()
    return response

### Begin Interaction

In [None]:
def chat():
    model, tokenizer = load_fine_tuned_model()

    print("What can I answer for you today? Type 'q' or 'quit' to exit.")
    while True:
        user_input = input("You: ")

        if user_input.lower() in ['q', 'quit']:
            print("Exiting chatbot! All the best for your job search :)")
            break

        user_input = user_input.strip()  # Simple text normalization
        response = generate_response(model, tokenizer, user_input)
        print("Bot:", response)

if __name__ == "__main__":
    chat()

What can I answer for you today? Type 'q' or 'quit' to exit.
You: define recursion
Bot: recursive algorithm is one that invokes some or all of its outer nodes in a recursive program
You: please explain recursion
Bot: recursion is a recursive algorithm used to solve a class of computational problems by enclosing an accumulator with an upper bound on the number of possible solutions that may be made by taking the solution of a recursion as a whole and dividing it into parts to solve the bigger problem
You: what is a decision tree?
Bot: a decision tree is a tree which represents a decision in terms of its children
You: explain binary search
Bot: binary search is a search algorithm that finds a shortest path between two vertices or edges using binary search algorithms often used in computer graphics and cryptography
You: which furniture would you describe yourself as?
Bot: a computer programmer
You: q
Exiting chatbot! All the best for your job search :)


#### BERT Score evaluation

In [None]:
!pip install bert_score

Collecting bert_score
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bert_score
Successfully installed bert_score-0.3.13


In [None]:
import torch
from bert_score import score as bert_score
import random
from transformers import logging

# Set logging level to suppress informational logs
logging.set_verbosity_error()  # Only errors will be shown

model, tokenizer = load_fine_tuned_model()
generated_responses = []
all_results = []  # For storing results

for question, answer in zip(questions, answers):
    generated_response = generate_response(model, tokenizer, question)
    generated_responses.append(generated_response)

    # Prepare candidates and references for BERTScore calculation
    candidates = [generated_response]  # Generated response
    references = [answer]              # Ground truth answer

    # Calculate BERTScore
    P, R, F1 = bert_score(candidates, references, lang='en', device='cuda' if torch.cuda.is_available() else 'cpu')

    # Append results to the list
    all_results.append({
        "Question": question,
        "Ground Truth": answer,
        "Generated Response": generated_response,
        "Precision": P.mean().item(),
        "Recall": R.mean().item(),
        "F1": F1.mean().item()
    })

# Create a DataFrame for results
results_df = pd.DataFrame(all_results)

# Save the DataFrame to a CSV file
results_df.to_csv('/content/drive/My Drive/Colab Notebooks/nlc/project/gpt2_bertscores.csv', index=False)

# Calculate average BERTScores
avg_precision = results_df["Precision"].mean()
avg_recall = results_df["Recall"].mean()
avg_f1 = results_df["F1"].mean()

# Print average scores
print(f"Average Precision: {avg_precision:.4f}")
print(f"Average Recall: {avg_recall:.4f}")
print(f"Average F1 Score: {avg_f1:.4f}")

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/482 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]



Average Precision: 0.8790
Average Recall: 0.8724
Average F1 Score: 0.8755


### Fallback Mechanism with BERTScore

In [None]:
import time
import random
import torch
import warnings
from bert_score import score as bert_score
from transformers import logging

# Suppress FutureWarnings from transformers library
warnings.simplefilter(action='ignore', category=FutureWarning)

# Set logging level to suppress informational logs
logging.set_verbosity_error()

def evaluateInput(encoder, decoder, searcher, voc, max_length=MAX_LENGTH, threshold=0.7):
    print("What can I answer for you today? Type 'q' or 'quit' to exit.")

    known_questions = [pair[0] for pair in qa_pairs]  # Assuming this is pre-loaded from your data

    while True:
        # Get input sentence
        input_sentence = input('> ')

        # Check if it is quit case
        if input_sentence.lower() in ['q', 'quit']:
          print("Exiting chatbot! All the best for your job search :)")
          break

        # Normalize sentence
        input_sentence = normalizeString(input_sentence)

        # Prepare to calculate BERTScore
        batch_size = 128
        all_bert_scores = []
        candidates = [input_sentence] * len(known_questions)  # Create a list of candidates matching reference list size

        # Start timing for total time per input
        start_time = time.time()

        # Process known questions in batches
        for i in range(0, len(known_questions), batch_size):
            refs_batch = known_questions[i:i + batch_size]
            # Create a corresponding batch of candidates
            cand_batch = candidates[i:i + batch_size]  # Match candidates to the size of reference batch

            # Print the current batch number
            print(f"Computing BERTScore for batch {i // batch_size + 1}")

            P, R, F1 = bert_score(cand_batch, refs_batch, lang='en', device='cuda' if torch.cuda.is_available() else 'cpu', rescale_with_baseline=True)
            max_f1 = F1.max().item()
            all_bert_scores.append(max_f1)

        # Get the maximum BERTScore across all batches
        max_bertscore = max(all_bert_scores)
        print(f"Maximum BERTScore: {max_bertscore}")

        # Check if BERTScore meets the threshold before generating a response
        if max_bertscore >= threshold:
            # If the score is sufficient, proceed to generate a model response
            output_words = evaluate(encoder, decoder, searcher, voc, input_sentence)

            # Filter out EOS and PAD tokens
            output_words[:] = [x for x in output_words if x not in ['EOS', 'PAD']]

            # Print the bot's response if valid
            if output_words:
                print("Bot:", ' '.join(output_words))
            else:
                print("Bot: I generated no valid response, but your question was similar enough.")

        else:
            # Provide fallback response if BERTScore is below the threshold
            resources = [
                "Perhaps this article might help: https://www.linkedin.com/advice/3/what-should-you-research-before-computer-science-r6tuc",
                "You might find this helpful: https://medium.com/@andreimargeloiu/the-definitive-guide-to-the-coding-interview-2704d166664c",
                "Here is a more light-hearted video to help with your preparation: https://youtu.be/1t1_a1BZ04o?feature=shared"
            ]
            selected_resource = random.choice(resources)
            print("Bot: I'm sorry, I didn't quite understand your question.")
            print(selected_resource)
            print("Alternatively, please specify more details or clarify your query!")

        # End timing for total time per input
        end_time = time.time()
        total_time = end_time - start_time
        # print(f"Total time taken for computing BERTScores: {total_time:.4f} seconds")

### BLEU, ROUGE, BERT Score

1. Install Necessary Packages and Libraries

In [None]:
!pip install nltk rouge-score

Collecting rouge-score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge-score
  Building wheel for rouge-score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24935 sha256=df3fb932adff1d336111be9ebc90bd99580c49610971fd9471bce4f4c6f1f44f
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge-score
Installing collected packages: rouge-score
Successfully installed rouge-score-0.1.2


In [None]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from rouge_score import rouge_scorer
import pandas as pd

2.  Evaluation with BLEU and ROUGE

In [None]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
import torch
from bert_score import score as bert_score
import random
from transformers import logging
import nltk
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from rouge_score import rouge_scorer
import pandas as pd

# Set logging level to suppress informational logs
logging.set_verbosity_error()  # Only errors will be shown

# Load the fine-tuned model
model, tokenizer = load_fine_tuned_model()
generated_responses = []
all_results = []  # For storing results

# Initialize BLEU smoothing function
smoothie = SmoothingFunction().method4

# Initialize ROUGE scorer
rouge = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)

# If you haven't already, split your data into training and testing sets
from sklearn.model_selection import train_test_split

# Assuming questions and answers are your data lists
train_questions, test_questions, train_answers, test_answers = train_test_split(
    questions, answers, test_size=0.2, random_state=42
)

for question, answer in zip(test_questions, test_answers):
    generated_response = generate_response(model, tokenizer, question)
    generated_responses.append(generated_response)

    # Prepare candidates and references
    candidate = generated_response.strip()  # Generated response
    reference = answer.strip()              # Ground truth answer

    # Calculate BERTScore
    P, R, F1 = bert_score(
        [candidate], [reference],
        lang='en',
        device='cuda' if torch.cuda.is_available() else 'cpu'
    )

    # Tokenize the texts for BLEU and ROUGE using the model's tokenizer
    reference_tokens = tokenizer.tokenize(reference)
    candidate_tokens = tokenizer.tokenize(candidate)

    # Calculate BLEU score
    bleu_score = sentence_bleu(
        [reference_tokens], candidate_tokens,
        smoothing_function=smoothie
    )

    # Calculate ROUGE scores
    rouge_scores = rouge.score(reference, candidate)
    rouge1_f1 = rouge_scores['rouge1'].fmeasure
    rougeL_f1 = rouge_scores['rougeL'].fmeasure

    # Append results to the list
    all_results.append({
        "Question": question,
        "Ground Truth": reference,
        "Generated Response": candidate,
        "BERTScore_Precision": P.mean().item(),
        "BERTScore_Recall": R.mean().item(),
        "BERTScore_F1": F1.mean().item(),
        "BLEU": bleu_score,
        "ROUGE-1_F1": rouge1_f1,
        "ROUGE-L_F1": rougeL_f1
    })

# Create a DataFrame for results
results_df = pd.DataFrame(all_results)

# Save the DataFrame to a CSV file
results_df.to_csv('/content/drive/My Drive/Colab Notebooks/nlc/project/gpt2_scores_w_bleu_rogue_bert.csv', index=False)

# Calculate average scores
avg_bert_precision = results_df["BERTScore_Precision"].mean()
avg_bert_recall = results_df["BERTScore_Recall"].mean()
avg_bert_f1 = results_df["BERTScore_F1"].mean()
avg_bleu = results_df["BLEU"].mean()
avg_rouge1_f1 = results_df["ROUGE-1_F1"].mean()
avg_rougeL_f1 = results_df["ROUGE-L_F1"].mean()

# Print average scores
print(f"Average BERTScore Precision: {avg_bert_precision:.4f}")
print(f"Average BERTScore Recall: {avg_bert_recall:.4f}")
print(f"Average BERTScore F1 Score: {avg_bert_f1:.4f}")
print(f"Average BLEU Score: {avg_bleu:.4f}")
print(f"Average ROUGE-1 F1 Score: {avg_rouge1_f1:.4f}")
print(f"Average ROUGE-L F1 Score: {avg_rougeL_f1:.4f}")

Average BERTScore Precision: 0.8803
Average BERTScore Recall: 0.8761
Average BERTScore F1 Score: 0.8780
Average BLEU Score: 0.1822
Average ROUGE-1 F1 Score: 0.3889
Average ROUGE-L F1 Score: 0.3415


### Telegram Implementation
- Replace token with your own API token obtained by BotFather on Telegram
- application = Application.builder().token('**Replace token here**').build()

In [None]:
!pip install python-telegram-bot --upgrade
!pip install nest_asyncio

Collecting python-telegram-bot
  Downloading python_telegram_bot-21.7-py3-none-any.whl.metadata (17 kB)
Downloading python_telegram_bot-21.7-py3-none-any.whl (654 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m654.9/654.9 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-telegram-bot
Successfully installed python-telegram-bot-21.7


In [None]:
import logging
import asyncio
import nest_asyncio
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes
from transformers import GPT2LMHeadModel, GPT2TokenizerFast
import torch

# Apply the nest_asyncio patch for Jupyter environments
nest_asyncio.apply()

# Configure logging
logging.basicConfig(
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    level=logging.INFO
)

# Load the fine-tuned GPT-2 model and tokenizer
def load_fine_tuned_model(model_dir="fine_tuned_gpt2"):
    tokenizer = GPT2TokenizerFast.from_pretrained(model_dir)
    model = GPT2LMHeadModel.from_pretrained(model_dir)
    model.to(torch.device("cuda" if torch.cuda.is_available() else "cpu"))
    return model, tokenizer

# Generate a response using the fine-tuned GPT-2 model
def generate_response(model, tokenizer, input_text):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    prompt = f"Q: {input_text}\nA:"
    inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)

    outputs = model.generate(
        inputs,
        max_length=150,
        do_sample=True,
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    response = response.split("A:", 1)[-1].strip()
    return response

# Define the Telegram bot's response function
async def respond(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user_message = update.message.text.strip()
    try:
        response = generate_response(model, tokenizer, user_message)
        await update.message.reply_text(response)
    except Exception as e:
        logging.error(f"Error generating response: {e}")
        await update.message.reply_text("Sorry, something went wrong!")

# Main function to initialize and start the bot
async def main():
    # Replace 'YOUR_TOKEN' with your bot's API token
    application = Application.builder().token('YOUR_TOKEN').build()

    # Add handler for text messages
    application.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, respond))

    await application.initialize()
    await application.start()
    await application.updater.start_polling()
    await asyncio.Event().wait()

# Run the bot
if __name__ == "__main__":
    model, tokenizer = load_fine_tuned_model()
    asyncio.run(main())

ERROR:telegram.ext.Updater:Exception happened while polling for updates.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/telegram/ext/_updater.py", line 743, in _network_loop_retry
    if not await do_action():
  File "/usr/local/lib/python3.10/dist-packages/telegram/ext/_updater.py", line 737, in do_action
    return action_cb_task.result()
  File "/usr/lib/python3.10/asyncio/futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/usr/lib/python3.10/asyncio/tasks.py", line 232, in __step
    result = coro.send(None)
  File "/usr/local/lib/python3.10/dist-packages/telegram/ext/_updater.py", line 367, in polling_action_cb
    updates = await self.bot.get_updates(
  File "/usr/local/lib/python3.10/dist-packages/telegram/ext/_extbot.py", line 647, in get_updates
    updates = await super().get_updates(
  File "/usr/local/lib/python3.10/dist-packages/telegram/_bot.py", line 4421, in get_updates
    await self

KeyboardInterrupt: 