## Set Up Environment - Pip Install

In [27]:
!pip install langchain llama-index pypdf faiss-cpu chromadb



In [28]:
!pip install sentence-transformers langchain-community langchain-huggingface langchain_ollama



In [29]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

## Load the Resume (PDF) as Data Source

In [30]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("Vignesh_R_Resume.pdf")
pages = loader.load()

## Chunk & Embed the Text

In [31]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

In [32]:
# Step 1: Chunk the content
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
docs = text_splitter.split_documents(pages)

In [33]:
# Step 2: Create embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [34]:
# Step 3: Store in FAISS vector store
vectorstore = FAISS.from_documents(docs, embedding=embeddings)

## Connect to Ollama Model

In [35]:
from langchain_community.llms import Ollama

llm = Ollama(model="gemma3:4b")

## Build the RetrievalQA Pipeline

In [36]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

## Ask Questions About the Resume

In [55]:
%%time
from IPython.display import Markdown, display

questions = [
    "1. What are my key skills and technologies mentioned in the resume?",
    "2. What kind of projects have I worked on?",
    "3. What job roles am I best suited for?"
]

for q in questions:
    result = qa_chain.invoke(q)
    display(Markdown(f"### ✅ {q}\n**Answer:**\n{result['result']}"))


### ✅ 1. What are my key skills and technologies mentioned in the resume?
**Answer:**
Here’s a breakdown of Vignes R’s key skills and technologies based on the resume:

**Programming Languages:** Python, SQL

**Data Technologies & Tools:**

*   **Cloud Platforms:** AWS, Google Cloud Platform (GCP)
*   **Big Data:** Hadoop, Apache Kafka, Apache Hive, PySpark, Databricks
*   **Data Warehousing & Databases:** MySQL, Postgres, Clickhouse
*   **Data Visualization:** Tableau, Looker, Power BI, Superset
*   **Machine Learning & AI:** Machine Learning, Large Language Models (LLM), NLTK, Gen AI
*   **Data Engineering:** Airflow, Git

**Other Relevant Skills:** Data Analysis, Data Modeling, Automation, Problem-Solving, Governance, Business Analytics, Data Structures & Algorithms, Natural Language Processing (NLP), Computer Vision.

### ✅ 2. What kind of projects have I worked on?
**Answer:**
Based on the provided context, here’s a breakdown of the types of projects you’ve worked on:

*   **Gen AI Analytics App:** You built an end-to-end analytics application with Gen AI capabilities, specifically automating insights from uploaded Excel data using Large Language Models (LLMs).
*   **Platform Migration:** You converted SAS codes into a PySpark-Enterprise Analytics Platform.
*   **Routine Analytics & Impact Assessment:** You executed Business As Usual (BAU) deliverables to identify issues in existing campaigns.
*   **Data Modeling & ETL Pipelines:** You architected and constructed scalable ETL pipelines and data models.

### ✅ 3. What job roles am I best suited for?
**Answer:**
Based on the provided context, you are best suited for job roles in Data Analytics, Data Engineering, and Data Science. Your experience includes building analytics apps with Gen AI, architecting ETL pipelines, managing Airflow, and leading a team.

CPU times: user 123 ms, sys: 269 ms, total: 392 ms
Wall time: 14.9 s


In [56]:
print("\nSources used:\n", result["source_documents"])


Sources used:
 [Document(id='2f6adc6e-d904-4c9b-89de-2908a4a3466a', metadata={'producer': 'react-pdf', 'creator': 'react-pdf', 'creationdate': '2025-07-17T03:51:31+00:00', 'source': 'Vignesh_R_Resume.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Data Analytics, Data Engineering & Data Science. \nSKILLS\n \n Python, SQL, Machine Learning, Natural Language Processing (NLP), Computer Vision, Large Language \nModels (LLM), NLTK, Big Data, Data Analytics, Business Analytics, AWS, Google Cloud Platform, Airflow, \nApache Kafka, Hadoop, MySQL, NumPy, Pandas, Postgres, Tableau, Looker, Power BI, Data Analysis, JIRA, \nGit, Analytics, Databricks, GCP, Cloud, PySpark, Superset, Data Engineering, Data Modeling, Automation, Data'), Document(id='f98d75b6-94fe-4acf-9fd2-ff8ddd09c0ad', metadata={'producer': 'react-pdf', 'creator': 'react-pdf', 'creationdate': '2025-07-17T03:51:31+00:00', 'source': 'Vignesh_R_Resume.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_co