# Reading and Extracting Text from Resume PDF

This notebook demonstrates how to safely read and extract text from a PDF file using Python libraries. We'll use the `pdfjs-dist` library that's already installed in the project.

In [1]:
# Import required libraries
import os
import sys
from pathlib import Path

# Verify PyPDF2 is installed
try:
    import PyPDF2
    print("PyPDF2 version:", PyPDF2.__version__)
except ImportError:
    print("Installing PyPDF2...")
    !pip install PyPDF2
    import PyPDF2
    print("PyPDF2 installed successfully")

PyPDF2 version: 3.0.1


In [3]:
# Set up the file path and validate it exists
script_dir = Path(__file__).parent if '__file__' in globals() else Path.cwd()
project_root = script_dir.parent
resume_path = (project_root / 'public' / 'Resume1.pdf').resolve()

if not resume_path.exists():
    raise FileNotFoundError(f"Could not find PDF file at {resume_path}")
    
print(f"Found resume at: {resume_path}")

Found resume at: C:\Users\vinee\OneDrive\Desktop\academics\portfolio2\project\public\Resume1.pdf


In [4]:
def extract_text_from_pdf(pdf_path):
    try:
        # Open the PDF file
        with open(pdf_path, 'rb') as file:
            # Create PDF reader object
            pdf_reader = PyPDF2.PdfReader(file)
            
            # Extract text from each page
            text = []
            for page in pdf_reader.pages:
                text.append(page.extract_text())
                
            return '\n'.join(text)
    except Exception as e:
        print(f"Error reading PDF: {e}")
        return None

# Extract text from the resume
resume_text = extract_text_from_pdf(resume_path)
if resume_text:
    print("Successfully extracted text from resume. Preview of first 500 characters:")
    print("-" * 80)
    print(resume_text[:500])
    print("-" * 80)
    print("\nFull text:")
    print(resume_text)
else:
    print("Failed to extract text from resume")

Successfully extracted text from resume. Preview of first 500 characters:
--------------------------------------------------------------------------------
Vineet Agarwal
240-353-9811 — vineet54@umd.edu — linkedin.com/in/vineet-agarwal-540abc/ — github.com/vineetagarwal54
Summary —Software Engineer with 2+ years of experience developing scalable web and mobile applications across
distributed systems. Proficient in Python, React Native, and AWS. Experienced in the full software development lifecycle
(SDLC) including design, development, testing, and deployment. Passionate about building efficient, cloud-native solutions and
solving complex challenge
--------------------------------------------------------------------------------

Full text:
Vineet Agarwal
240-353-9811 — vineet54@umd.edu — linkedin.com/in/vineet-agarwal-540abc/ — github.com/vineetagarwal54
Summary —Software Engineer with 2+ years of experience developing scalable web and mobile applications across
distributed systems. Pro