## Business Understanding
### Problem Statement
Matching job seekers with relevant job opportunities is crucial for both candidates and employers. However, traditional keyword-based search systems lack the intelligence to align jobs with an individual's holistic profile—skills, experience, and preferences—leading to inefficiencies.
#### Objective
Develop an intelligent NLP-based job recommendation system that uses the job description dataset to recommend suitable roles for applicants based on:
Their skills, qualifications, and experience. Job descriptions, required skills, and responsibilities.
#### Key Stakeholders
Applicants: Need personalized recommendations to find jobs aligned with their skillset and career goals. Employers: Want efficient shortlisting of relevant candidates. Recruitment Platforms: Seek to enhance user engagement and improve match accuracy.
#### Goals
For Job Seekers: Deliver precise job recommendations tailored to their profiles. Save time by reducing the need for extensive manual searches.
For Employers: Improve applicant-job alignment, reducing hiring timelines.
For the Platform: Enhance user satisfaction and retention through advanced recommendations.

The project aims to analyze job descriptions to identify patterns in required skills, qualifications, and experience across industries and roles. It will map competencies to specific job titles using techniques like skill extraction and clustering. Recommendations will consider various factors, including geographical preferences, salary expectations, and work type, to create a comprehensive match.
To measure success, metrics like recommendation accuracy, user engagement, and system performance will be tracked. The next steps involve exploring the dataset further, defining applicant features, and designing the recommendation model, potentially leveraging collaborative filtering and content-based approaches.
#### System Functionality Overview
The job recommendation system will allow users to input their skills, work experience, job title, employment duration, and job description. Using NLP, the system will analyze this information to extract key skills and industry-specific terms, matching the user's profile against job descriptions in the dataset.
Recommendations will be tailored based on factors like required skills, experience, location, work type, and salary, ensuring a personalized and relevant list of job opportunities. This approach streamlines the job search process, helping users identify roles aligned with their expertise and career goals.

In [7]:
#Importing libraries
import pandas as pd
import numpy as np

In [9]:
#Loading the dataset
df = pd.read_csv('data/job_descriptions.csv', low_memory=False)

In [10]:
def understand_data(data):
    # Print the summary information of the dataset including data types and non-null counts
    print("\nDataset Information:")
    data.info()
    # Print the first 5 rows of the dataset to give a preview of the data
    print("\nSample Data:")
    print(data.head())

In [12]:
understand_data(df)


Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1615940 entries, 0 to 1615939
Data columns (total 23 columns):
 #   Column            Non-Null Count    Dtype  
---  ------            --------------    -----  
 0   Job Id            1615940 non-null  int64  
 1   Experience        1615940 non-null  object 
 2   Qualifications    1615940 non-null  object 
 3   Salary Range      1615940 non-null  object 
 4   location          1615940 non-null  object 
 5   Country           1615940 non-null  object 
 6   latitude          1615940 non-null  float64
 7   longitude         1615940 non-null  float64
 8   Work Type         1615940 non-null  object 
 9   Company Size      1615940 non-null  int64  
 10  Job Posting Date  1615940 non-null  object 
 11  Preference        1615940 non-null  object 
 12  Contact Person    1615940 non-null  object 
 13  Contact           1615940 non-null  object 
 14  Job Title         1615940 non-null  object 
 15  Role              1615940 n

In [11]:
# Keep only the first 50,000 rows
df_first_100k = df.head(50000)

# Optionally, save the new dataset to a new CSV file
df_first_100k.to_csv('data/dataset_first_50k.csv', index=False)