# Resume-Based Job Query Generator

This notebook analyzes resumes (PDF/DOC) and generates relevant LinkedIn job search queries based on skills and experience.


## Import Required Libraries


In [1]:
import os
import sys
from pathlib import Path
from typing import List, Optional

from langchain_ollama import OllamaLLM
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain_community.document_loaders import PyPDFLoader, Docx2txtLoader
from pydantic import BaseModel, Field
from dotenv import find_dotenv, load_dotenv


# Add parent directory to path
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath("."))))

In [2]:
# Load environment variables
load_dotenv(find_dotenv())

True

## Setup Ollama LLM


In [3]:
# Initialize OpenAI LLM with GPT-3.5 Turbo
from langchain_openai import ChatOpenAI
import os

# # Load OpenAI API key from environment variable
# openai_api_key = os.getenv("OPENAI_API_KEY")
# if not openai_api_key:
#     raise ValueError("OPENAI_API_KEY environment variable is not set")

# llm = ChatOpenAI(model="gpt-5", temperature=0.1)
llm = ChatOpenAI(
    model="gpt-5",
    # "gpt-4o-mini",
    # "gpt-5",
    # "gpt-3.5-turbo"
    temperature=0.3,
)
# llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.1, max_tokens=1000)

# Commented out Ollama configuration
# llm = OllamaLLM(
#     model="llama3.2",
#     base_url="http://localhost:11434",
#     temperature=0.1,
#     num_predict=1000,
# )

## Define Output Schema


In [4]:
class JobSearchQueries(BaseModel):
    """Structured output for job search queries."""

    primary_titles: List[str] = Field(
        description="Realistic next-step job titles with best chance of getting hired (5 titles)"
    )
    secondary_titles: List[str] = Field(
        description="Opportunistic and futuristic job titles leveraging domain/technical knowledge (8 titles)"
    )
    skill_based_queries: List[str] = Field(
        description="Skill-focused search queries (2-3 queries)"
    )
    industry_focus: str = Field(
        description="Primary industry or sector based on experience"
    )
    seniority_level: str = Field(
        description="Experience level: Entry, Mid, Senior, or Executive"
    )

## Create System and User Prompts


In [5]:
SYSTEM_PROMPT = """
You are an expert career advisor and recruitment specialist specializing in future career growth and opportunities. Your task is to analyze a resume and generate forward-looking LinkedIn job search queries that focus on career advancement and future potential.

Analyze the resume content and focus on:
1. Technical skills and competencies - what roles leverage these best?
2. Work experience and career progression - identify patterns and strengths
3. Industry background - current expertise and emerging sectors
4. Educational qualifications - how can these be leveraged?
5. Certifications and achievements - what doors do these open?

IMPORTANT DISTINCTION:

PRIMARY JOB TITLES (5 titles): Focus on realistic next-step opportunities where the candidate has the BEST CHANCE of getting hired based on their current strengths and experience. Give preference to roles that build naturally from their most recent positions and demonstrated expertise. These should be achievable career moves.

SECONDARY JOB TITLES (8 titles): Focus on more FUTURISTIC and OPPORTUNISTIC roles that leverage the candidate's domain knowledge and technical skills in new ways. Include:
- Emerging positions and innovative job titles
- Cross-industry opportunities where their skills transfer
- Advanced roles that represent significant career leaps
- Opportunistic pivots based on technical knowledge (consulting, product, strategy)
- Domain expertise applications in new contexts
- Leadership and advisory roles based on technical background
- Entrepreneurial or specialized consulting opportunities
- Think 2-5 years into the future and consider evolving job markets

IMPORTANT FORMATTING RULES:
- Provide ONLY the job title without any explanations, justifications, or descriptions
- Do NOT include seniority levels (Senior, Junior, Lead, etc.) in the title - keep titles generalized
- Keep titles clean and searchable on LinkedIn

Format your response exactly as follows:

PRIMARY JOB TITLES:
- [job title 1]
- [job title 2]
- [job title 3]
- [job title 4]
- [job title 5]

SECONDARY JOB TITLES:
- [futuristic role 1]
- [cross-industry opportunity 2]
- [advanced leadership position 3]
- [innovative field role 4]
- [consulting/advisory role 5]
- [domain expertise pivot 6]
- [entrepreneurial opportunity 7]
- [specialized expertise role 8]

SKILL-BASED QUERIES:
- [skill combination 1]
- [skill combination 2]

INDUSTRY FOCUS: [primary industry based on experience + emerging opportunities]

SENIORITY LEVEL: [Target future level: Mid/Senior/Executive/Leadership]
"""

USER_PROMPT = """
Based on the following resume content, generate job search queries for LinkedIn with clear distinction between realistic next steps and opportunistic future roles:

RESUME CONTENT:
{resume_content}

Analyze this resume and provide:
1. PRIMARY TITLES: Realistic roles where they have the best chance of getting hired (bias toward recent experience and proven skills)
2. SECONDARY TITLES: Opportunistic and futuristic roles that leverage their domain knowledge and technical skills in creative ways - include consulting, advisory, cross-industry pivots, and entrepreneurial opportunities

Remember: Provide ONLY clean job titles without explanations or seniority levels. Keep titles generalized and LinkedIn-searchable.

Please provide the job search information in the exact format specified above.
"""

# Create prompt template
prompt_template = PromptTemplate(
    template=SYSTEM_PROMPT + "\n" + USER_PROMPT, input_variables=["resume_content"]
)

## Build LangChain with Structured Output


In [6]:
import json
import re
from langchain.schema import BaseOutputParser


class JobQueryOutputParser(BaseOutputParser):
    """Parser to extract structured job queries from LLM responses."""

    def parse(self, text: str) -> JobSearchQueries:
        """Parse job search queries from text."""
        # Initialize lists and variables
        primary_titles = []
        secondary_titles = []
        skill_based_queries = []
        industry_focus = ""
        seniority_level = ""

        lines = [line.strip() for line in text.split("\n") if line.strip()]
        current_section = None

        for line in lines:
            line_lower = line.lower()

            # Section headers
            if "primary" in line_lower and "title" in line_lower:
                current_section = "primary"
                continue
            elif "secondary" in line_lower and "title" in line_lower:
                current_section = "secondary"
                continue
            elif "skill" in line_lower and (
                "query" in line_lower or "queries" in line_lower
            ):
                current_section = "skills"
                continue
            elif "industry" in line_lower and "focus" in line_lower:
                current_section = "industry"
                # Extract industry from same line if present
                if ":" in line:
                    industry_focus = line.split(":", 1)[1].strip()
                continue
            elif "seniority" in line_lower and "level" in line_lower:
                current_section = "seniority"
                # Extract seniority from same line if present
                if ":" in line:
                    seniority_level = line.split(":", 1)[1].strip()
                continue

            # Extract list items
            if line.startswith(
                (
                    "- ",
                    "• ",
                    "* ",
                    "1. ",
                    "2. ",
                    "3. ",
                    "4. ",
                    "5. ",
                    "6. ",
                    "7. ",
                    "8. ",
                )
            ):
                # Remove list markers
                item = re.sub(r"^[-•*]?\s*\d*\.?\s*", "", line).strip()
                if item:
                    if current_section == "primary":
                        primary_titles.append(item)
                    elif current_section == "secondary":
                        secondary_titles.append(item)
                    elif current_section == "skills":
                        skill_based_queries.append(item)

            # Extract key-value pairs
            elif ":" in line and current_section in ["industry", "seniority"]:
                key, value = line.split(":", 1)
                value = value.strip()
                if "industry" in key.lower() or current_section == "industry":
                    industry_focus = value
                elif (
                    "seniority" in key.lower()
                    or "level" in key.lower()
                    or current_section == "seniority"
                ):
                    seniority_level = value

        # Validate that we have the required data
        if not primary_titles:
            raise ValueError("Failed to extract primary job titles from LLM response")
        if not secondary_titles:
            raise ValueError("Failed to extract secondary job titles from LLM response")
        if not skill_based_queries:
            raise ValueError("Failed to extract skill-based queries from LLM response")
        if not industry_focus:
            raise ValueError("Failed to extract industry focus from LLM response")
        if not seniority_level:
            raise ValueError("Failed to extract seniority level from LLM response")

        return JobSearchQueries(
            primary_titles=primary_titles[:5],
            secondary_titles=secondary_titles[
                :8
            ],  # Updated to allow 8 secondary titles
            skill_based_queries=skill_based_queries[:3],
            industry_focus=industry_focus,
            seniority_level=seniority_level,
        )


# Create parser and chain
parser = JobQueryOutputParser()
job_query_chain = prompt_template | llm | parser

## Create Resume Processing Function


In [7]:
def process_resume(file_path: str) -> JobSearchQueries:
    """Process resume file and generate job search queries."""

    file_path = Path(file_path)

    if not file_path.exists():
        raise FileNotFoundError(f"Resume file not found: {file_path}")

    # Load document based on file extension
    if file_path.suffix.lower() == ".pdf":
        loader = PyPDFLoader(str(file_path))
    elif file_path.suffix.lower() in [".doc", ".docx"]:
        loader = Docx2txtLoader(str(file_path))
    else:
        raise ValueError(f"Unsupported file format: {file_path.suffix}")

    # Load and extract text
    documents = loader.load()
    resume_content = "\n".join([doc.page_content for doc in documents])

    # Generate job queries using the chain
    result = job_query_chain.invoke({"resume_content": resume_content})

    return result

## Test with Sample Resume


In [8]:
# Test with existing resume in the data folder
resume_path = "../genai_job_finder/data/Ali Zarreh_CV_2025_08_30.pdf"

try:
    job_queries = process_resume(resume_path)

    print("🎯 Generated Job Search Queries:")
    print("\n📍 Primary Job Titles:")
    for title in job_queries.primary_titles:
        print(f"  • {title}")

    print("\n🔍 Secondary Job Titles:")
    for title in job_queries.secondary_titles:
        print(f"  • {title}")

    print("\n🛠️ Skill-Based Queries:")
    for query in job_queries.skill_based_queries:
        print(f"  • {query}")

    print(f"\n🏭 Industry Focus: {job_queries.industry_focus}")
    print(f"📈 Seniority Level: {job_queries.seniority_level}")

except Exception as e:
    print(f"Error processing resume: {e}")
    print(
        "Make sure OPENAI_API_KEY environment variable is set and the resume file exists"
    )

🎯 Generated Job Search Queries:

📍 Primary Job Titles:
  • Data Scientist
  • Machine Learning Engineer
  • Generative AI Engineer
  • Operations Research Scientist
  • Pricing Data Scientist

🔍 Secondary Job Titles:
  • AI Solutions Architect
  • Data Science Manager
  • AI Product Manager
  • LLM Engineer
  • Conversational AI Engineer
  • AI Strategy Consultant
  • Supply Chain Optimization Scientist
  • Industrial AI Consultant

🛠️ Skill-Based Queries:
  • LangChain LangGraph RAG Conversational AI Vertex AI Databricks FastAPI
  • Pricing Optimization XGBoost Linear Programming Time Series Forecasting Spark Ray GCP

🏭 Industry Focus: Retail and e-commerce analytics + Generative AI platforms and supply chain optimization
📈 Seniority Level: Senior/Leadership


In [9]:
# Test with existing resume in the data folder
resume_path = "../genai_job_finder/data/Zarreh_Resume.docx"

try:
    job_queries = process_resume(resume_path)

    print("🎯 Generated Job Search Queries:")
    print("\n📍 Primary Job Titles:")
    for title in job_queries.primary_titles:
        print(f"  • {title}")

    print("\n🔍 Secondary Job Titles:")
    for title in job_queries.secondary_titles:
        print(f"  • {title}")

    print("\n🛠️ Skill-Based Queries:")
    for query in job_queries.skill_based_queries:
        print(f"  • {query}")

    print(f"\n🏭 Industry Focus: {job_queries.industry_focus}")
    print(f"📈 Seniority Level: {job_queries.seniority_level}")

except Exception as e:
    print(f"Error processing resume: {e}")
    print(
        "Make sure OPENAI_API_KEY environment variable is set and the resume file exists"
    )

🎯 Generated Job Search Queries:

📍 Primary Job Titles:
  • Operations Research Analyst
  • Optimization Engineer
  • Simulation Engineer
  • Supply Chain Analyst
  • Data Scientist

🔍 Secondary Job Titles:
  • Operations Research Scientist
  • Digital Twin Engineer
  • Decision Intelligence Analyst
  • AI Optimization Engineer
  • Optimization Consultant
  • Infrastructure Analytics Consultant
  • Computational Social Scientist
  • Analytics Product Manager

🛠️ Skill-Based Queries:
  • Pyomo Gurobi CPLEX vehicle routing optimization Python
  • Agent-based modeling Python digital twin FastAPI simulation

🏭 Industry Focus: Operations research and supply chain analytics + digital twin, infrastructure, and AI-driven decision systems
📈 Seniority Level: Mid


## Usage Example with Custom Resume


In [None]:
# Example usage with custom resume path
# Replace with your resume path
custom_resume_path = (
    "/home/alireza/projects/genai_job_finder/genai_job_finder/data/Zarreh_Resume.docx"
)

# Uncomment to test with your resume:
job_queries = process_resume(custom_resume_path)
print(job_queries)

NameError: name 'process_resume' is not defined

In [None]:
# Test the modular implementation with full functionality
from genai_job_finder.query_definition import ResumeQueryService
from genai_job_finder.query_definition.config import (
    get_openai_config,
    get_ollama_config,
)

# Use the correct file path
resume_file = "../genai_job_finder/data/Ali_Zarreh_CV_2025_08_30.pdf"

# Test with Ollama (local model - no API key required)
print("🤖 Testing with Ollama (local)...")
try:
    service = ResumeQueryService(get_ollama_config())
    queries = service.process_resume_file(resume_file)
    print("✅ Ollama test successful!")
    print(queries.display_summary())
except Exception as e:
    print(f"❌ Ollama test failed: {e}")

print("\n" + "=" * 50 + "\n")

# Test with OpenAI (requires API key)
print("🤖 Testing with OpenAI...")
try:
    service = ResumeQueryService(get_openai_config())
    # Check if service can perform health check
    if service.health_check():
        queries = service.process_resume_file(resume_file)
        print("✅ OpenAI test successful!")
        print(queries.display_summary())
    else:
        print("❌ OpenAI health check failed")
except Exception as e:
    print(f"❌ OpenAI test failed: {e}")
    print("💡 Make sure OPENAI_API_KEY is set in your environment")

2025-09-14 12:19:46,125 - genai_job_finder.query_definition.service - INFO - Initialized ResumeQueryService with ollama provider
2025-09-14 12:19:46,127 - genai_job_finder.query_definition.service - INFO - Processing resume file: ../genai_job_finder/data/Ali_Zarreh_CV_2025_08_30.pdf
2025-09-14 12:19:46,127 - genai_job_finder.query_definition.service - INFO - Processing resume file: ../genai_job_finder/data/Ali_Zarreh_CV_2025_08_30.pdf


🤖 Testing with Ollama (local)...


2025-09-14 12:19:46,399 - genai_job_finder.query_definition.service - INFO - Processing resume content with LLM
2025-09-14 12:19:46,771 - httpx - INFO - HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
2025-09-14 12:19:46,771 - httpx - INFO - HTTP Request: POST http://localhost:11434/api/generate "HTTP/1.1 200 OK"
2025-09-14 12:19:52,780 - genai_job_finder.query_definition.service - INFO - Successfully generated job search queries
2025-09-14 12:19:52,781 - genai_job_finder.query_definition.service - INFO - Initialized ResumeQueryService with openai provider
2025-09-14 12:19:52,783 - genai_job_finder.query_definition.service - INFO - Processing resume content with LLM
2025-09-14 12:19:52,780 - genai_job_finder.query_definition.service - INFO - Successfully generated job search queries
2025-09-14 12:19:52,781 - genai_job_finder.query_definition.service - INFO - Initialized ResumeQueryService with openai provider
2025-09-14 12:19:52,783 - genai_job_finder.query_def

✅ Ollama test successful!

📍 Primary Job Titles (5):
  • Senior Data Scientist
  • Lead Data Analyst
  • Operations Research Analyst
  • Predictive Maintenance Specialist
  • Business Intelligence Developer

🔍 Secondary Job Titles (8):
  • AI Solutions Architect
  • Machine Learning Consultant
  • Cybersecurity Risk Management Specialist
  • Digital Transformation Advisor
  • Data Science Director
  • Advanced Analytics Strategist
  • Cross-Industry Data Integration Expert
  • Entrepreneurial Data Scientist

🛠️ Skill-Based Queries:
  • Generative AI & Large Language Models (LLMs)
  • Machine Learning & Deep Learning
  • Big Data Technologies (Spark, Hadoop, Ray)

🏭 Industry Focus: Retail, Manufacturing, and Technology
📈 Seniority Level: Mid



🤖 Testing with OpenAI...


2025-09-14 12:19:54,933 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-14 12:19:54,937 - genai_job_finder.query_definition.service - INFO - Successfully generated job search queries
2025-09-14 12:19:54,937 - genai_job_finder.query_definition.service - INFO - Processing resume file: ../genai_job_finder/data/Ali_Zarreh_CV_2025_08_30.pdf
2025-09-14 12:19:54,937 - genai_job_finder.query_definition.service - INFO - Successfully generated job search queries
2025-09-14 12:19:54,937 - genai_job_finder.query_definition.service - INFO - Processing resume file: ../genai_job_finder/data/Ali_Zarreh_CV_2025_08_30.pdf
2025-09-14 12:19:55,057 - genai_job_finder.query_definition.service - INFO - Processing resume content with LLM
2025-09-14 12:19:55,057 - genai_job_finder.query_definition.service - INFO - Processing resume content with LLM
2025-09-14 12:19:57,679 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTT

✅ OpenAI test successful!

📍 Primary Job Titles (5):
  • Data Scientist
  • Machine Learning Engineer
  • AI Product Engineer
  • Data Analyst
  • Research Scientist

🔍 Secondary Job Titles (8):
  • AI Consultant
  • Data Science Manager
  • Machine Learning Architect
  • Cybersecurity Analyst
  • AI Entrepreneur
  • Data Strategy Advisor
  • Predictive Analytics Specialist
  • AI Research Scientist

🛠️ Skill-Based Queries:
  • Python Machine Learning
  • Deep Learning Forecasting

🏭 Industry Focus: Retail and Manufacturing + Emerging opportunities in AI, ML, and cybersecurity
📈 Seniority Level: Mid to Senior

