## Set Up Environment - Pip Install

In [57]:
!pip install langchain llama-index pypdf faiss-cpu chromadb

In [58]:
!pip install sentence-transformers langchain-community langchain-huggingface langchain_ollama

In [59]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

## Load the Resume (PDF) as Data Source

In [60]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("Vignesh_R_Resume.pdf")
pages = loader.load()

## Chunk & Embed the Text

In [61]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

In [62]:
# Step 1: Chunk the content
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
docs = text_splitter.split_documents(pages)

In [66]:
# Step 2: Create embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-mpnet-base-v2") #all-mpnet-base-v2, all-MiniLM-L6-v2

In [67]:
# Step 3: Store in FAISS vector store
vectorstore = FAISS.from_documents(docs, embedding=embeddings)

## Connect to Ollama Model

In [68]:
from langchain_community.llms import Ollama

llm = Ollama(model="gemma3:4b")

## Build the RetrievalQA Pipeline

In [69]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

## Ask Questions About the Resume

In [70]:
%%time
from IPython.display import Markdown, display

questions = [
    "1. What are my key skills and technologies mentioned in the resume?",
    "2. What kind of projects have I worked on?",
    "3. What job roles am I best suited for?"
]

for q in questions:
    result = qa_chain.invoke(q)
    display(Markdown(f"### ✅ {q}\n**Answer:**\n{result['result']}"))


### ✅ 1. What are my key skills and technologies mentioned in the resume?
**Answer:**
Here’s a list of Vignes’ key skills and technologies mentioned in the resume:

**Programming Languages:** Python, SQL

**Data Technologies & Methodologies:** Data Analytics, Data Engineering, Data Science, Machine Learning, Natural Language Processing (NLP), Computer Vision, Large Language Models (LLM), NLTK, Big Data, Data Modeling, ETL Pipelines, Medallion Architecture, PySpark, Cloud

**Platforms & Tools:** AWS, Google Cloud Platform (GCP), Airflow, Apache Kafka, Hadoop, MySQL, Postgres, Tableau, Looker, Power BI, Databricks, Superset, JIRA, Git

**Other Relevant Skills:** Automation, Business Analytics, Sales Retrospective Analytics, Effort Score Calculation.

### ✅ 2. What kind of projects have I worked on?
**Answer:**
Based on the provided context, V Vignesh has worked on projects including:

*   **Scalable ETL pipelines and data models:** Architecting and building these for seamless data flow.
*   **Sales Retrospective Analytics:** Measuring the effectiveness of sales and marketing campaigns.
*   **Effort Score Calculation:** Developing a heuristic model to analyze customer struggles during the customer journey.



### ✅ 3. What job roles am I best suited for?
**Answer:**
Based on the provided information, V Vignesh is best suited for job roles in Data Analytics, Data Engineering, and Data Science. Specifically, his experience with ETL pipelines, data modeling, Airflow, and various data technologies (SQL, Python, Big Data tools) aligns well with these roles.

CPU times: user 285 ms, sys: 807 ms, total: 1.09 s
Wall time: 18 s


In [71]:
print("\nSources used:\n", result["source_documents"])


Sources used:
 [Document(id='a22703d6-ca78-4ce0-8671-64c1a3f32646', metadata={'producer': 'react-pdf', 'creator': 'react-pdf', 'creationdate': '2025-07-17T03:51:31+00:00', 'source': 'Vignesh_R_Resume.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Data Analytics, Data Engineering & Data Science. \nSKILLS\n \n Python, SQL, Machine Learning, Natural Language Processing (NLP), Computer Vision, Large Language \nModels (LLM), NLTK, Big Data, Data Analytics, Business Analytics, AWS, Google Cloud Platform, Airflow, \nApache Kafka, Hadoop, MySQL, NumPy, Pandas, Postgres, Tableau, Looker, Power BI, Data Analysis, JIRA, \nGit, Analytics, Databricks, GCP, Cloud, PySpark, Superset, Data Engineering, Data Modeling, Automation, Data'), Document(id='a175a73d-f157-4d6a-8914-16f6f4e047b0', metadata={'producer': 'react-pdf', 'creator': 'react-pdf', 'creationdate': '2025-07-17T03:51:31+00:00', 'source': 'Vignesh_R_Resume.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_co