# Web Scraping Job Vacancies

## Introduction

In this project, we'll build a web scraper to extract job listings from a popular job search platform. We'll extract job titles, companies, locations, job descriptions, and other relevant information.

Here are the main steps we'll follow in this project:

1. Setup our development environment
2. Understand the basics of web scraping
3. Analyze the website structure of our job search platform
4. Write the Python code to extract job data from our job search platform
5. Save the data to a CSV file
6. Test our web scraper and refine our code as needed

## Prerequisites

Before starting this project, you should have some basic knowledge of Python programming and HTML structure. In addition, you may want to use the following packages in your Python environment:

- requests
- BeautifulSoup
- csv
- datetime

## Step 1: Importing Required Libraries

In [1]:
# Task 1: Import required libraries
from datetime import datetime
import csv
import requests
from bs4 import BeautifulSoup
import time

## Step 2: Generating a URL with a function

In [2]:
# Task 2: Generating a URL with a function
def generate_url(position, location):
    base_url = "https://example.com/jobs?q={}&l={}"  # Example URL template
    # Replace example URL with the actual URL template of the job posting site
    url = base_url.format(position, location)
    return url

## Step 3: Extract the Job Data from a single job posting card

In [3]:
# Task 3: Extract the Job Data from a single job posting card
def extract_job_data(job_posting):
    try:
        job_title = job_posting.find('h2', class_='job-title').text.strip()
    except AttributeError:
        job_title = ""
    
    try:
        company_name = job_posting.find('span', class_='company-name').text.strip()
    except AttributeError:
        company_name = ""
    
    try:
        location = job_posting.find('span', class_='location').text.strip()
    except AttributeError:
        location = ""
    
    # Continue extracting other relevant job data
    
    return {
        'Job Title': job_title,
        'Company': company_name,
        'Location': location,
        # Add more fields as needed
    }

## Step 4: Define the main function

In [4]:
# Task 4: Define the main function
def main(job_position, location):
    # 1. Set headers
    headers = {'User-Agent': 'Mozilla/5.0'}
    
    # 2. Construct URL
    url = generate_url(job_position, location)
    
    # 3. Send HTTP request and retrieve HTML
    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 4. Parse HTML and select job postings
    job_postings = soup.find_all('div', class_='job-posting')
    
    job_data_list = []
    
    # 5. Extract job posting information
    for job_posting in job_postings:
        job_data = extract_job_data(job_posting)
        job_data_list.append(job_data)
    
    # 6. Write data to CSV file
    filename = f"{job_position}_{location}_jobs_{datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}.csv"
    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['Job Title', 'Company', 'Location']  # Add more fields as needed
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        
        writer.writeheader()
        for job_data in job_data_list:
            writer.writerow(job_data)
    
    # 7. Print success message
    print("Job data extraction successful. CSV file created:", filename)

## Task 5: Describe Conclusions

This project successfully implemented web scraping to extract job posting data, enhancing the recruitment agency's efficiency and competitiveness. Challenges like varied HTML structures were overcome through systematic coding and error handling. The modular design ensures adaptability to different job parameters and website changes. The project demonstrates problem-solving skills and attention to detail in data extraction and processing. Future enhancements could include multithreading for faster scraping and caching mechanisms for improved performance. Overall, this project highlights the ability to deliver efficient solutions that meet stakeholder needs and drive organizational success.