<a href="https://colab.research.google.com/github/virguleria/python-assignment/blob/main/Numerical_Programming_in_Python_Analyze_it_Yourself.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Problem Statement: Navigating the Data Science Job Landscape**

🚀 Unleash your creativity in crafting a solution that taps into the heartbeat of the data science job market! Envision an ingenious project that seamlessly wields cutting-edge web scraping techniques and illuminating data analysis.

🔍 Your mission? To engineer a tool that effortlessly gathers job listings from a multitude of online sources, extracting pivotal nuggets such as job descriptions, qualifications, locations, and salaries.

🧩 However, the true puzzle lies in deciphering this trove of data. Can your solution discern patterns that spotlight the most coveted skills? Are there threads connecting job types to compensation packages? How might it predict shifts in industry demand?

🎯 The core objectives of this challenge are as follows:

1. Web Scraping Mastery: Forge an adaptable and potent web scraping mechanism. Your creation should adeptly harvest data science job postings from a diverse array of online platforms. Be ready to navigate evolving website structures and process hefty data loads.

2. Data Symphony: Skillfully distill vital insights from the harvested job listings. Extract and cleanse critical information like job titles, company names, descriptions, qualifications, salaries, locations, and deadlines. Think data refinement and organization.

3. Market Wizardry: Conjure up analytical tools that conjure meaningful revelations from the gathered data. Dive into the abyss of job demand trends, geographic distribution, salary variations tied to experience and location, favored qualifications, and emerging skill demands.

4. Visual Magic: Weave a tapestry of visualization magic. Design captivating charts, graphs, and visual representations that paint a crystal-clear picture of the analyzed data. Make these visuals the compass that guides users through job market intricacies.

🌐 While the web scraping universe is yours to explore, consider these platforms as potential stomping grounds:

* LinkedIn Jobs
* Indeed
* Naukri
* Glassdoor
* AngelList

🎈 Your solution should not only decode the data science job realm but also empower professionals, job seekers, and recruiters to harness the dynamic shifts of the industry. The path is open, the challenge beckons – are you ready to embark on this exciting journey?






##GitHub Link -

##Project Summary:

This Python script is a web scraping project that automates the collection of job listings for the position of "Python Developer" from the TimesJobs website. The project demonstrates how to use the requests library to send HTTP GET requests, BeautifulSoup for parsing HTML content, and pandas to organize the extracted data into a structured format. It iterates through multiple search result pages and extracts key details from each job listing, including the company name, required skills, years of experience, location, job description, and a direct link to the job posting. The collected data is then stored in a Pandas DataFrame and saved to a CSV file for further analysis or reference.

##Explanation:

This Python web scraping project is designed to automate the process of gathering job listings for the position of "Python Developer" from the TimesJobs website. The project demonstrates the step-by-step process of collecting data from a web source using various Python libraries.

Import Necessary Libraries:

The script begins by importing the required Python libraries: requests, BeautifulSoup, and pandas.
Set the Base URL and Search Parameters:

The base URL for the TimesJobs website is defined as "https://www.timesjobs.com/candidate/job-search.html."
A dictionary named "parameters" is used to store the search parameters. These parameters include search type, search keywords (e.g., "Python Developer"), location (e.g., "India"), and the initial page number (startPage).
Create an Empty List for Job Data:

An empty list named "jobs_data1" is created to store the scraped job data.
Scrape Data from Multiple Pages:

The script enters a loop to scrape data from multiple search result pages. It continues scraping until the "startPage" parameter exceeds the "sequence" parameter, which determines the number of pages to scrape.
Send an HTTP GET Request:

Inside the loop, an HTTP GET request is sent to the TimesJobs website with the specified parameters using the requests library.
Parse HTML Content:

The HTML content of the response is parsed using BeautifulSoup with the 'lxml' parser.
Find Job Listings:

Job listings are identified by finding HTML elements with the class 'clearfix job-bx wht-shd-bx.'
Extract Job Data:

For each job listing, relevant information is extracted and stored in a dictionary named "data," including the company name, required skills, years of experience, location(s), job description, and a link to the job posting.
Append Data to the List:

The "data" dictionary is appended to the "jobs_data1" list for each job listing found on the page.
Increment the Page Parameter:

The "startPage" parameter is incremented to move to the next page of search results.
Create a Pandas DataFrame:

Once all the job data has been collected, a Pandas DataFrame is created from the "jobs_data1" list.
Save Data to a CSV File:

The DataFrame is saved to a CSV file named "job_listings10.csv," excluding the index column.
This project demonstrates how to scrape job listings from a website, parse the HTML content, extract specific data, and store it in a structured format using Python. It showcases a practical use case of web scraping for job-related information and provides a foundation for similar data extraction tasks from other websites.

Implementin web scrapping on Timesjobs.com

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

def scrape_jobs(url, keywords, location, sequence, output_filename):
    # Define search parameters for TimesJobs website
    search_params = {
        'searchType': 'personalizedSearch',
        'from': 'submit',
        'luceneResultSize': 50,  # Number of items per page
        'txtKeywords': keywords,  # Keywords to search for
        'txtLocation': location,  # Location for the search
        'sequence': sequence,  # Number of pages to scrape
        'startPage': 1  # Start page for scraping
    }

    jobs_data = []  # Initialize an empty list to store job data

    while search_params['startPage'] <= search_params['sequence']:
        # Send an HTTP GET request to the TimesJobs website with the search parameters
        response = requests.get(url, params=search_params)
        soup = BeautifulSoup(response.text, 'lxml')  # Parse the response with BeautifulSoup
        jobs = soup.find_all('li', class_='clearfix job-bx wht-shd-bx')  # Find job listings on the page

        if not jobs:
            break  # If no job listings are found, exit the loop

        for job in jobs:
            data = {}  # Create a dictionary to store job data
            data['Company'] = job.find('h3', class_='joblist-comp-name').get_text(strip=True)  # Extract company name
            data['Skills'] = job.find('span', class_='srp-skills').get_text(strip=True)  # Extract required skills

            ul = job.find('ul', class_='top-jd-dtl clearfix').findChildren(recursive=False)
            data['Exp'] = ul[0].find(text=True, recursive=False)  # Extract job experience

            data['Location(s)'] = ul[1].span.text if ul[1].span else None  # Extract job location(s)

            ul1 = job.find('ul', class_='list-job-dtl clearfix').findChildren(recursive=False)
            data['Desc'] = ul1[0].find('label').next_sibling.strip()  # Extract job description

            data['link'] = job.header.h2.a['href']  # Extract job link
            jobs_data.append(data)  # Append the extracted job data to the list

        search_params['startPage'] += 1  # Increment the startPage parameter to scrape the next page

    df = pd.DataFrame(jobs_data)  # Create a Pandas DataFrame from the scraped job data

    # Save the DataFrame to a CSV file, excluding the index column
    df.to_csv(output_filename, index=False)

    return df  # Return the DataFrame containing the scraped job data

In [None]:
# calling the function to scrape jobs and save to a CSV file
df = scrape_jobs("https://www.timesjobs.com/candidate/job-search.html", "Python Developer", "India", 3, "job_listings10.csv")

In [None]:
df

In [None]:
#lets check the summary
df.info()

##Data cleanning

####Checking Null Values if Any

In [None]:
df.isnull().sum()

#####No Null values present

####There are multiple Job positions with the name of same company. That will count as duplicates , so we will be avoid dropping them

##Cleaning and formatting the 'Exp' column to extract numeric experience values.

In [None]:
# Clean and format the 'Exp' column
df['Exp'] = df['Exp'].str.replace(' yrs', '').str.replace(' Yrs', '').str.replace('yr', '').str.replace('Yr', '').str.strip()
df['Exp'] = df['Exp'].str.extract(r'(\d+)').astype(float)

##Cleaning and formatting the 'Location(s)' column to remove extra spaces.

In [None]:
# Clean and format the 'Skills' column by stripping extra whitespace
df['Skills'] = df['Skills'].str.strip()

In [None]:
df.head(n = 20 )

####Data is Cleaned now lets do some visualizations

##Data Visualization

####Count of Job Listings by Company

In [None]:
import matplotlib.pyplot as plt

# Count job listings by company
company_counts = df['Company'].value_counts()

# Plot the top 10 companies with the most job listings
top_10_companies = company_counts.head(10)

plt.figure(figsize=(10, 6))
top_10_companies.plot(kind='bar')
plt.title('Top 10 Companies with Most Job Listings')
plt.xlabel('Company')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

####Observations::

1. As you can see Anicalls pty ltd and Tanda HR solutions Hire very agressively .
2. They look like growing companie

####Job Experience Distribution

In [None]:
import seaborn as sns

# Plot the distribution of job experience levels
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Exp')
plt.title('Job Experience Distribution')
plt.xlabel('Experience Level')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

##Observation:

1. There are plenty amount of jobs For Experience Python Developer In India.
2. Even Freshers with 0 Experience have few jobs as well.

####Skills Required

In [None]:
# Plot the most common skills required
skills_counts = df['Skills'].str.split(', ').explode().value_counts()

# Plot the top 10 skills
top_10_skills = skills_counts.head(10)

plt.figure(figsize=(10, 6))
top_10_skills.plot(kind='bar')
plt.title('Top 10 Required Skills')
plt.xlabel('Skill')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

##Observations:

For Python developer jobs these You need to have knowledge about django , python , rest , nosql , docker technologies because these are in demand skills and companies are hiring for these technology experts.

####Bar Plot of Top 10 Locations with Most Job Listings

In [None]:
import matplotlib.pyplot as plt

# Count job listings by location
location_counts = df['Location(s)'].value_counts()

# Plot the top 10 locations with the most job listings
top_10_locations = location_counts.head(10)

plt.figure(figsize=(10, 6))
top_10_locations.plot(kind='bar')
plt.title('Top 10 Locations with Most Job Listings')
plt.xlabel('Location')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

####Observation:

1. Looking at Job locations most of these companies hire for Bengluru location.
2. Many of them hire for Hydrabad and Chandigarh Locations.
3. As expected because these are IT hubs of Our India.

####Word Cloud of Job Descriptions

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# Combine all job descriptions into a single string
job_descriptions = " ".join(df['Desc'])

# Create a word cloud
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(job_descriptions)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.title('Word Cloud of Job Descriptions')
plt.show()

##Observation:

1. This word cloud displays the most common words used in job descriptions, where the size of each word corresponds to its frequency.
2. If you add these words in your resume , there is really high chance you getting shortlisted through their ATS.

##Conclusion:

This web scraping project provides a practical example of how to collect job-related information from the TimesJobs website using Python. It demonstrates the use of popular libraries like requests, BeautifulSoup, and pandas to automate the data extraction process. The script can be modified to scrape job listings for different positions, locations, or additional details, making it a versatile tool for gathering data from online job portals. The extracted data is organized in a structured format, making it easy for further analysis or reference.