# Use Case Description
Recruiters in HRTech industry / head hunters deal with enormouse amounts of data in job descriptions and resumes. Matching a job description with resume has many stages and predominantly recruiters gather keywords (their own understanding which can be limited) from job description, and hunt for those in job resumes to get to the next step of reaching out to candidates. Few noteworthy points
- The speed at which a recruiter works is a function of their domain knowledge, human  extraction of keywords / semantic forms (e.g. "java" is semantic to spring framework)
- There is generally 1000 to 10k resumes for some jobs and humanly applying the first filter (aka. first pass filter to potentially narrows down resumes) might take days to weeks even for the most experienced recruiter
- Any human can make errors when matching and their own biases come into play when applying the first filter (errors can be not matching "spring boot" to "java" OR filtering out the resume because they don't find "java" mentioned enough number of times etc.)
- Result can be loss of excellent talent, time delay = $ cost and many more  

<strong>For the first pass filter i.e </strong> <i>looking for a keyword in a resume in the form of question-answering</i> and augment the human search with LLM + AI semantic search</strong> seems a great start

- We will use my own resume from [linkedin](https://www.linkedin.com/in/pradeepmacharla/)
- Note: The pdf version is not available in this repo, instead use your own resume

# Technical
- This notebook implements the same workflow model that [privateGPT](https://github.com/imartinez/privateGPT) has done and packaged into a nice python script (Path1: ingest documents > Tokenize > Embeddings/Vectors > VectorStore , Path2: Input Query > Embed and find similarity score using vecstore store > Pass this as context to LLM > Respond)
- privateGPT uses duckdb, we are using FAISS and pickling (very much like sqlite)
- <strong>Ubuntu 20.04 OS with no GPU is used for this</strong> on 8gb ram, 4 cpus
- Python 3.10.12 has been used as you can see below (It doesn't matter whether you manage runtime with pyenv or conda - both should produce identical results)
- The below libraries and frameworks are used in this notebook and app.py  
- Most of the code is adopted from [blog](https://blog.streamlit.io/langchain-tutorial-4-build-an-ask-the-doc-app/)
- We did not use OPENAI though because most of them might not want to end up spending $ when doing POC for self or learning. 
- Huggingface account is needed as we download a model, however once downloaded we no longer need it
- There are many parameters and hyper parameter values that we chose (e.g. max_tokens) - Knowing which value to set is highly dependent on domain and use case

In [2]:
import sys
sys.version_info


sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)

In [None]:
!pip3 install streamlit==1.24.1
!pip3 install PyPDF2==3.0.1
!pip3 install langchain==0.0.234
!pip3 install faiss-cpu==1.7.4

## Import modules

In [1]:
import streamlit as st
from PyPDF2 import PdfReader
import pickle
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain import HuggingFaceHub

from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings
from langchain.embeddings.spacy_embeddings import SpacyEmbeddings
from langchain.embeddings import GPT4AllEmbeddings

from langchain.llms import GPT4All

### HuggingFace initial
- For using transformers library to download models or interact with huggingface hub...
- On a shell prompt `huggingface-cli login` - complete the process if you wish to run outside this notebook
- Access token can be got using [help](https://huggingface.co/docs/hub/security-tokens)

In [None]:
# If notebook_login() doesn't work use interpretor_login
from huggingface_hub import notebook_login, interpreter_login
notebook_login()
# interpreter_login()

In [2]:
# Substitute with whatever pdf you want to ask questions on 
SAMPLE_RESUME = "pradeep_resume.pdf"

In [3]:
# Note we are not doing any preprocessing, cleaning etc, as text from PDF can contain special characters aka. noise for data science models
# However we are using pretrained LLMs directly (tokenizers and embeddings hopefully take care - lets see below) 
pdf_reader = PdfReader(SAMPLE_RESUME)
text = ""
for page in pdf_reader.pages:
    text += page.extract_text()

len(text)

3540

In [4]:
text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len)

chunks = text_splitter.split_text(text=text)

## Create vector store and index
- if exists, reuse, else create
- This might take a min (depending on content) to run
- We will use HuggingFaceEmbeddings

In [5]:
store_name = "myvector_store"
if os.path.exists(f"{store_name}.pkl"):
        with open(f"{store_name}.pkl", "rb") as f:
            vector_store = pickle.load(f)
else:
    embeddings = HuggingFaceEmbeddings()
    vector_store = FAISS.from_texts(chunks, embedding=embeddings)
    with open(f"{store_name}.pkl", "wb") as f:
        pickle.dump(vector_store, f)

### Model
- As you can see we are not using openai, but using a locally downloaded GPT4All
- There are many models you can choose from and they have trade-offs mostly speed, accuracy, performance dimensions
- Go to [GPT4All](https://gpt4all.io/index.html) and use Model explorer to download the model
- We used orca-mini-3b.ggmlv3.q4_0.bin which is about 1.8gb, which most personal laptops should be able to handle
- Response should come back in under 10-20 seconds (if you guessed it will be nano or pico second using GPU - of course smarty pants!)

In [14]:
query = "What is the contact address in the input document?"
docs = vector_store.similarity_search(query=query, k=2)

llm = GPT4All(model="/home/ubuntu/Downloads/orca-mini-3b.ggmlv3.q4_0.bin",max_tokens=2048)
chain = load_qa_chain(llm, chain_type="stuff")
response = chain.run(input_documents=docs,question=query)
response

Found model file at  /home/ubuntu/Downloads/orca-mini-3b.ggmlv3.q4_0.bin


llama.cpp: loading model from /home/ubuntu/Downloads/orca-mini-3b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 3200
llama_model_load_internal: n_mult     = 240
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 26
llama_model_load_internal: n_rot      = 100
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 8640
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 3B
llama_model_load_internal: ggml ctx size =    0.06 MB
llama_model_load_internal: mem required  = 2862.72 MB (+  682.00 MB per state)
llama_new_context_with_model: kv self size  =  650.00 MB


' The contact information in the input document is "macharla.pradeep.kumar@gmai l.com"'

In [15]:
query = "Can you list all skills please?"
docs = vector_store.similarity_search(query=query, k=2)

llm = GPT4All(model="/home/ubuntu/Downloads/orca-mini-3b.ggmlv3.q4_0.bin",max_tokens=2048)
chain = load_qa_chain(llm, chain_type="stuff")
response = chain.run(input_documents=docs,question=query)
response

Found model file at  /home/ubuntu/Downloads/orca-mini-3b.ggmlv3.q4_0.bin


llama.cpp: loading model from /home/ubuntu/Downloads/orca-mini-3b.ggmlv3.q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 3200
llama_model_load_internal: n_mult     = 240
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 26
llama_model_load_internal: n_rot      = 100
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 8640
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 3B
llama_model_load_internal: ggml ctx size =    0.06 MB
llama_model_load_internal: mem required  = 2862.72 MB (+  682.00 MB per state)
llama_new_context_with_model: kv self size  =  650.00 MB


' Sure, here are some of the skills listed by Pradeep Macharla on his LinkedIn profile: IT Leadership | Strategic Planning | P&L | Team Development | Vendor Negotiation & Management \nData Solutions Design & Development | Hands-on Data Analysis, Engineering and Science DOMAIN SKILLS Technical - Business Intelligence | Enterprise Software Development| Big Data & AI/ML | Software Automation Solution Design, implementation and maintenance | Digital Marketing Tools | ETL Tools INDIVIDUAL ACHIEVEMENTS \n1) Book Author - Android Continuous Integration - https://www.amazon.com/Android-Continuous-Integration-Automating-Software-Development/dp/032159487X \nDaily Mail and General Trust plc Quality Assurance Lead May 2010 - August 2012 (2 years 4 months) Naviance, a subsidiary of hobsons, in turn the subsidiary of Daily Mail and Trust PLC, is the premier education solution provider for k-12 schools in the US. www.naviance.com www.hobsons.com www.dmgt.co.uk Page 3 of'

## Change Model

In [7]:
query = "Can you list all skills for pradeep?"
docs = vector_store.similarity_search(query=query, k=2)

llm = GPT4All(model="/home/ubuntu/Downloads/ggml-gpt4all-j-v1.3-groovy.bin",max_tokens=2048)
chain = load_qa_chain(llm, chain_type="stuff")
response = chain.run(input_documents=docs,question=query)
response

Found model file at  /home/ubuntu/Downloads/ggml-gpt4all-j-v1.3-groovy.bin
gptj_model_load: loading model from '/home/ubuntu/Downloads/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx   = 2048
gptj_model_load: n_embd  = 4096
gptj_model_load: n_head  = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot   = 64
gptj_model_load: f16     = 2
gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size  =  896.00 MB
gptj_model_load: ................................... done
gptj_model_load: model size =  3609.38 MB / num tensors = 285


' Pradeep has the following skills listed on his LinkedIn profile: Project Delivery Service Delivery Account Management Certifications AWS Certified Solutions Architect - AssociatePradeep Macha'