# üéØ SmartMatch Resume Analyzer - Part 1: Setup and Data

> **Interactive tutorial demonstrating setup, dependencies, and data models for AI-powered resume optimization**

This is the first notebook in a 3-part series exploring the SmartMatch Resume Analyzer's AI pipeline. In this notebook, you'll learn how to set up the environment and understand the data models that power the analysis.

## üìö Tutorial Series Overview

1. **Part 1: Setup and Data** (This notebook) - Environment setup, dependencies, and data models
2. **Part 2: Analysis Pipeline** - Core AI analysis engine and LangChain integration  
3. **Part 3: Results and Interpretation** - Running analyses and understanding results

## üìã What You'll Learn

- **Environment Setup**: Configure OpenAI API and required dependencies
- **Data Models**: Understand Pydantic models for type safety
- **Sample Data**: Work with realistic resume and job description examples
- **Production Patterns**: See how to structure NLP applications for reliability

## üöÄ Technical Stack

- **LangChain**: Document processing and LLM chain orchestration
- **OpenAI GPT-3.5-turbo**: Semantic analysis and text generation
- **Pydantic**: Type safety and automatic response validation
- **Python 3.11+**: Modern Python with async/await patterns

## üì¶ Setup and Dependencies

First, let's install the required dependencies. This notebook demonstrates the same pipeline used in the production application.

In [None]:
# Install required dependencies
!pip install langchain>=0.3.0 langchain-openai>=0.3.0 langchain-community>=0.3.0 
!pip install openai>=1.0.0 pydantic>=2.5.3 python-dotenv>=1.0.0
!pip install asyncio nest-asyncio  # For Jupyter notebook async support

In [None]:
# Import required libraries
import asyncio
import nest_asyncio
import json
import os
from typing import Dict, List, Any
from datetime import datetime

# Pydantic for data validation
from pydantic import BaseModel, Field
from typing import List, Optional

# Enable async in Jupyter
nest_asyncio.apply()

print("‚úÖ Dependencies loaded successfully!")

## üîê Environment Configuration

Configure your OpenAI API key. For production use, always use environment variables or secure configuration management.

In [None]:
# Configure OpenAI API key
# Option 1: From environment variable (recommended)
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

# Option 2: Direct input (for tutorial only - not recommended for production)
if not OPENAI_API_KEY:
    OPENAI_API_KEY = input("Enter your OpenAI API key: ")

# Verify API key is configured
if OPENAI_API_KEY and len(OPENAI_API_KEY) > 20:
    print(f"‚úÖ API key configured (length: {len(OPENAI_API_KEY)})")
else:
    print("‚ùå Please configure your OpenAI API key")

## üìä Data Models

Define Pydantic models for type safety and automatic validation - a crucial pattern for production NLP applications.

In [None]:
class BulletSuggestion(BaseModel):
    """Model for bullet point improvement suggestions."""
    original: str = Field(..., description="Original bullet point")
    improved: str = Field(..., description="AI-improved version")
    reason: str = Field(..., description="Explanation of improvements")

class AnalysisResponse(BaseModel):
    """Complete analysis response model with validation."""
    match_percentage: float = Field(..., ge=0, le=100, description="Match percentage")
    matched_keywords: List[str] = Field(default=[], description="Keywords found in both texts")
    missing_keywords: List[str] = Field(default=[], description="Job keywords missing from resume")
    suggestions: List[BulletSuggestion] = Field(default=[], description="Improvement suggestions")
    strengths: List[str] = Field(default=[], description="Resume strengths")
    areas_for_improvement: List[str] = Field(default=[], description="Areas needing improvement")
    overall_feedback: str = Field(..., description="Summary feedback")
    processing_time: Optional[float] = Field(None, description="Analysis processing time")

print("‚úÖ Data models defined with Pydantic validation")

## üìù Sample Data

Let's prepare realistic sample data to demonstrate the AI analysis capabilities in subsequent notebooks.

In [None]:
# Sample resume - Software Engineer transitioning to ML
SAMPLE_RESUME = """
John Smith
Software Engineer
Email: john.smith@email.com

PROFESSIONAL SUMMARY
Experienced software engineer with 5+ years developing scalable web applications and data pipelines.
Strong background in Python, cloud technologies, and agile development practices.

TECHNICAL SKILLS
Languages: Python, JavaScript, SQL, Java
Frameworks: Django, Flask, React, Node.js
Databases: PostgreSQL, MongoDB, Redis
Cloud: AWS (EC2, S3, Lambda), Docker, Kubernetes
Tools: Git, Jenkins, JIRA, Prometheus

EXPERIENCE
Senior Software Engineer | TechCorp | 2021-2024
‚Ä¢ Developed real-time data processing pipeline using Apache Kafka handling 100k+ messages/hour
‚Ä¢ Optimized database queries improving response time by 40% through indexing and query optimization
‚Ä¢ Led team of 3 engineers in implementing microservices architecture using Docker and Kubernetes
‚Ä¢ Mentored junior developers and conducted code reviews maintaining 95% code quality standards

Software Engineer | StartupXYZ | 2019-2021
‚Ä¢ Built REST APIs using Django and Flask serving 10,000+ daily active users
‚Ä¢ Implemented automated testing and CI/CD pipelines reducing deployment time by 60%
‚Ä¢ Collaborated with product team using agile methodologies and sprint planning

EDUCATION
Bachelor of Science in Computer Science | University of Technology | 2019
"""

# Sample job description - Machine Learning Engineer
SAMPLE_JOB_DESCRIPTION = """
Machine Learning Engineer
Company: AI Innovations Inc.

We are seeking a skilled Machine Learning Engineer to join our AI team and help build next-generation ML solutions.

REQUIREMENTS:
‚Ä¢ 3+ years of experience in machine learning and data science
‚Ä¢ Strong proficiency in Python and machine learning frameworks (TensorFlow, PyTorch, Scikit-learn)
‚Ä¢ Experience with MLOps practices, model deployment, and monitoring
‚Ä¢ Knowledge of deep learning, neural networks, and NLP techniques
‚Ä¢ Experience with cloud platforms (AWS, GCP) and containerization (Docker)
‚Ä¢ Strong background in statistics, mathematics, and data analysis
‚Ä¢ Experience with model training, evaluation, and optimization

RESPONSIBILITIES:
‚Ä¢ Design and implement machine learning models for various business problems
‚Ä¢ Build and maintain ML pipelines from data ingestion to model deployment
‚Ä¢ Collaborate with data scientists and engineers to productionize ML solutions
‚Ä¢ Monitor model performance and implement improvements
‚Ä¢ Research and evaluate new ML techniques and technologies

PREFERRED QUALIFICATIONS:
‚Ä¢ MS/PhD in Computer Science, Machine Learning, or related field
‚Ä¢ Experience with distributed computing and big data technologies
‚Ä¢ Publications in ML conferences or journals
‚Ä¢ Experience with recommendation systems, computer vision, or NLP
"""

print("üìÑ Sample data loaded:")
print(f"   Resume: {len(SAMPLE_RESUME)} characters")
print(f"   Job Description: {len(SAMPLE_JOB_DESCRIPTION)} characters")

## ‚úÖ Setup Complete!

Perfect! You've successfully:

- ‚úÖ **Installed Dependencies**: All required packages for NLP analysis
- ‚úÖ **Configured API Access**: OpenAI API key ready for use
- ‚úÖ **Defined Data Models**: Type-safe Pydantic models for validation
- ‚úÖ **Loaded Sample Data**: Realistic resume and job description examples

## üöÄ Next Steps

Continue to **Part 2: Analysis Pipeline** to build the core AI analysis engine and see how LangChain and OpenAI work together to provide intelligent resume optimization.

---

*Part of the SmartMatch Resume Analyzer tutorial series. Built with ‚ù§Ô∏è using LangChain, OpenAI, and modern Python.*