<a href="https://colab.research.google.com/github/thomas-e-jung/resume-chatbot/blob/main/resume_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Resume Chatbot
Author: Thomas Jung (www.github.com/thomas-e-jung)

Implementation of RAG (Retrieval-Augmented Generation) using PDF input as context.

## Ensure GPU being used: Runtime tab > Change runtime type > T4 GPU

In [None]:
%pip install -U -q langchain_community pypdf bitsandbytes faiss-cpu

In [None]:
!git clone https://github.com/thomas-e-jung/resume-chatbot.git

In [None]:
from google.colab import files
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
from IPython.display import HTML, display

## Check if GPU being used

In [None]:
uploaded = {}
print("Using CUDA") if torch.cuda.is_available() else print("Warning! GPU not enabled. Runtime tab > Change runtime type > T4 GPU")

Using CUDA


Optional: Uncomment and run the following cell to upload your own resume in PDF format

In [None]:
# uploaded = files.upload()

In [None]:
if uploaded:
    fileName = list(uploaded.keys())[0]
else:
    fileName = "/content/resume-chatbot/Resume - Thomas Jung.pdf"

loader = PyPDFLoader(fileName)
pages = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(pages)

EMBEDDING_MODEL_NAME = "thenlper/gte-small"

embedding_model = HuggingFaceEmbeddings(
    model_name=EMBEDDING_MODEL_NAME,
    multi_process=True,
    model_kwargs={"device": "cuda"},
    encode_kwargs={"normalize_embeddings": True},
)

KNOWLEDGE_VECTOR_DATABASE = FAISS.from_documents(
    docs, embedding_model, distance_strategy=DistanceStrategy.COSINE
)

  embedding_model = HuggingFaceEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading LLM pipeline...

Please wait, it may take a few minutes.

In [None]:
READER_MODEL_NAME = "HuggingFaceH4/zephyr-7b-beta"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModelForCausalLM.from_pretrained(
    READER_MODEL_NAME,
    quantization_config=bnb_config
)
tokenizer = AutoTokenizer.from_pretrained(READER_MODEL_NAME)

READER_LLM = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    do_sample=True,
    temperature=0.2,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=500,
)

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

`low_cpu_mem_usage` was None, now default to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

Device set to use cuda:0


In [None]:
prompt_in_chat_format = [
    {
        "role": "system",
        "content": """Using the information contained in the context,
give a comprehensive answer to the question.
Respond only to the question asked, response should be concise and relevant to the question.
Provide the number of the source document when relevant.
If the answer cannot be deduced from the context, do not give an answer.""",
    },
    {
        "role": "user",
        "content": """Context:
{context}
---
Now here is the question you need to answer.

Question: {question}""",
    },
]

RAG_PROMPT_TEMPLATE = tokenizer.apply_chat_template(prompt_in_chat_format, tokenize=False, add_generation_prompt=True)

In [None]:
def set_css():
    display(HTML('''
    <style>
        pre {
            white-space: pre-wrap;
        }
    </style>
    '''))
get_ipython().events.register('pre_run_cell', set_css)

def RAG_QA(user_query, llm, knowledge_index, num_docs=5):
    print("\nGetting context...")
    retrieved_docs = KNOWLEDGE_VECTOR_DATABASE.similarity_search(query=user_query, k=num_docs)
    retrieved_docs_text = [doc.page_content for doc in retrieved_docs]

    context = "\nExtracted documents:\n"
    context += "".join([f"Document {str(i)}:::\n" + doc for i, doc in enumerate(retrieved_docs_text)])
    final_prompt = RAG_PROMPT_TEMPLATE.format(question=user_query, context=context)
    print("Generating answer...\n")
    result = llm(final_prompt)[0]["generated_text"]
    return result

In [None]:
user_query = ""
while user_query != "exit":
    user_query = input("Questions about the candidate? Ask away:\n(Type exit to quit)\n")
    if user_query != "exit":
        print(RAG_QA(user_query, READER_LLM, KNOWLEDGE_VECTOR_DATABASE), "\n")

Questions about the candidate? Ask away:
(Type exit to quit)
Which companies has this candidate worked at?

Getting context...
Generating answer...

The candidate has worked at Brookfield Asset Management and Enwave Energy Corporation, as stated in Documents 2, 3, and 4. In Document 1, they also mention their employment history at Brookfield Asset Management during a specific time period. Therefore, the candidate has worked at two companies, Brookfield Asset Management and Enwave Energy Corporation. 

Questions about the candidate? Ask away:
(Type exit to quit)
Based on his past work experience, what kind of roles would he qualify for?

Getting context...
Generating answer...

Based on his past work experience, the individual would qualify for roles that involve utilizing advanced analytics techniques such as predictive modeling, data mining, and machine learning. He has experience working with various tools and technologies including Python, SQL Server, Power BI, AWS, HDFS, MapReduce,