# RAG example - Monica resume

Import packages

In [11]:
from langchain.document_loaders import PDFPlumberLoader
import re
from langchain.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.schema import Document
from langchain.llms import Ollama
from langchain.chains import RetrievalQA


Load resume

In [None]:
loader = PDFPlumberLoader("/Users/yuhsuanko/Downloads/Yu-Hsuan_Ko_Resume.pdf")
pages = loader.load()

full_text = "\n".join([p.page_content for p in pages])

CropBox missing from /Page, defaulting to MediaBox


Split text

In [3]:
# split section
def split_resume_by_section(text):
    sections = re.split(r"\n(?=[A-Z][A-Z ]{2,})", text)  # e.g. EXPERIENCE, PROJECTS
    chunks = []
    for section in sections:
        if len(section.strip()) > 50:  # ignore
            chunks.append(section.strip())
    return chunks

resume_chunks = split_resume_by_section(full_text)

# show results
for i, chunk in enumerate(resume_chunks):
    print(f"--- Chunk {i+1} ---\n{chunk[:500]}\n")


--- Chunk 1 ---
Yu-Hsuan (Monica) Ko
Chicago, Illinois, USA | (312) 284-9394 | yuhsuanko@uchicago.edu | linkedin.com/in/yu-hsuan-ko

--- Chunk 2 ---
EDUCATION
University of Chicago Chicago, Illinois
Master of Science in Applied Data Science Sep 2024 - Dec 2025
Relevant Courses: Big Data and Cloud Computing, Bayesian Machine Learning with Generative AI Applications,
Time Series Analysis and Forecasting, Natural Language Processing and Cognitive Computing
National Taiwan Normal University Taipei, Taiwan
Bachelor of Business Administration Sep 2018 - Jun 2022
Relevant Courses: Advanced Statistics, Calculus, Management Mathematics, Text Mining, 

--- Chunk 3 ---
SKILLS & CERTIFICATIONS
• Programming Languages: Python(scikit-learn,PySpark,TensorFlow,Pytorch,transformers),SQL,R,Java,JavaScript
• Technology Tools: Google Cloud Platform (GCP), AWS, Azure, Linux, Hadoop, Spark, Hive, Git, Docker, Confluence,

--- Chunk 4 ---
JIRA, UiPath
• ML / DL / LLM:RandomForest,NaturalLanguageProcessing(NL

In [12]:
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

docs = [Document(page_content=chunk) for chunk in resume_chunks]
db = FAISS.from_documents(docs, embedding_model)


llama

In [13]:
llm = Ollama(model="llama3")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())

query = "Does she have experience with RPA or AML?"
print(qa.run(query))


Based on the provided context, Yu-Hsuan (Monica) Ko has experience with:

* Robotic Process Automation (RPA): She mentions deploying 3 robotic process automation (RPA) pipelines with Python to automate systems and API tasks in her role as Data Analyst at E.SUN Commercial Bank.
* Anti-Money Laundering (AML): She designed innovative anti-money laundering (AML) typologies using Python and PostgreSQL, and revamped transaction monitoring system using Python, SAS, and T-SQL to improve AML detection rates by 15% in her role as Data Analyst at E.SUN Commercial Bank.


In [14]:
# Ask questions
while True:
    query = input("Ask a question about the resume (type 'exit' to quit): ")
    if query.lower() == "exit":
        break
    response = qa.run(query)
    print("Question:", query)
    print("Answer:", response)

Question: her work experience
Answer: Based on the provided context, here is an overview of their work experience:

* Data Management Specialist at LINE Taiwan Limited Taipei, Taiwan (Oct 2022 - Feb 2023):
	+ Optimized a customer service chatbot and improved natural language classification system.
	+ Identified technical issues from customer interactions and reported problems to relevant teams.
* Data Analyst at E.SUN Commercial Bank Taipei, Taiwan (Jul 2023 - Sep 2024):
	+ Designed innovative anti-money laundering typologies using Python and PostgreSQL.
	+ Revamped transaction monitoring system using Python, SAS, and T-SQL.
	+ Built machine learning models to identify suspicious transactions and conducted feature selection.
	+ Created visualized transaction networks to uncover complex interconnections across bank-wide transaction channels.
	+ Deployed robotic process automation (RPA) pipelines with Python to automate systems and API tasks.
* Data Scientist Intern at The Shanghai Comme