# Scarping Freshersworld Jobs Data Using Python
![Banner-image](https://i.imgur.com/ZTu9Y0K.png)

Freshersworld.com (A TeamLease Company) is No.1 job Portal for freshers hiring in India with a database of over 1.5+ Crore resumes. More than 3 Lakh+ resumes are added every month from entry level graduates across the country

The page https://www.freshersworld.com/ provides thousands of jobs in various streams. In this assignment, we will retrieve information for different job profiles using web_scraping: the process of extracting information from a website in an automated fashion using code. We will use [Requests](https://docs.python-requests.org/en/latest/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) to scrap data from this page.

The outline of this assignment is listed below:

1. Download the webpage using [Requests](https://docs.python-requests.org/en/latest/)
2. Parse the HTML source code using [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
3. Extract Company name, Job location, Apply link, Required qualification, Experience required, Last date
4. Compile extracted information using Python lists
5. Save the extracted information to a CSV file.

The CSV file which will be created will have the following format:
```
 job role,company name,job location,required qualification,experience required,last date,apply link
 Data Analyst,CAW Studios,Hyderabad/Bangalore,BE/B.Tech/ME/M.Tech/CS/MS,2 Years,26 Feb 23,https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559
 ....
```


## How to run the code

You can execute the code using the "Run on Binder" button at the top of this page. You can make changes and save your own version of the notebook to [Jovian](https://jovian.ai/).

In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [3]:
project_name = "scraping-job-portal"

In [4]:
jovian.commit(project = project_name)

<IPython.core.display.Javascript object>

[jovian] Updating notebook "yatinsammal/scraping-job-portal" on https://jovian.com[0m
[jovian] Committed successfully! https://jovian.com/yatinsammal/scraping-job-portal[0m


'https://jovian.com/yatinsammal/scraping-job-portal'

## Download the webpage using `requests`

We can use the `requests` library to download the webpage.

The library can be installed using `pip`.

In [5]:
!pip install requests --upgrade --quiet

In [6]:
import requests

To download a page, we can use the `get` function from requests, which returns the response object. 

In [7]:
job_search_url = 'https://www.freshersworld.com/'
response = requests.get(job_search_url)

`requests.get` returns a response object containing the data from the webpage and some other information.

The `.status_code` property can be used to check if the request was successful.A successful response will have an HTTPstatus code between 200 and 209.

In [8]:
response.status_code

200

Let us check the number of characters in the webpage.

In [9]:
page_contents = response.text
len(page_contents)

504944

The webpage contains over 14,000 characters. Here are the first 1000 characters of the page:

In [10]:
page_contents[:1000]

'<!DOCTYPE html><html lang="en"><head prefix="og: http://ogp.me/ns#"><!-- Google Tag Manager --><script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({\'gtm.start\':\nnew Date().getTime(),event:\'gtm.js\'});var f=d.getElementsByTagName(s)[0],\nj=d.createElement(s),dl=l!=\'dataLayer\'?\'&l=\'+l:\'\';j.async=true;j.src=\n\'https://www.googletagmanager.com/gtm.js?id=\'+i+dl;f.parentNode.insertBefore(j,f);\n})(window,document,\'script\',\'dataLayer\',\'GTM-5S37R5T\');</script><!-- End Google Tag Manager --><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Jobs: Search Jobs In India, Freshers Jobs Online, Govt Jobs, Recruitment | Freshersworld.com</title><link href="/manifest.json" rel="manifest"><meta property="og:type" content="website" /><meta property="og:url" content="https://www.freshersworld.com/" /><meta property="og:title" content="Jobs: Search Jobs In India, Freshers Jobs Online, Govt Jobs, Recruitment | Freshersworld.com" /><meta name="robots" conte

The above code is the [HTML source code](https://en.wikipedia.org/wiki/HTML) of the webpage. We can also save it to a file and view locally within Jupyter using "File -> Open".

In [11]:
with open('webpage.html','w') as f:
    f.write(page_contents)

The page looks similiar to the original but none of the links will work in this webpage.

![](https://i.imgur.com/fh31xcS.png)

In this section, we have successfully used the `requests` library to download a webpage as HTML.

## Parse the HTML source code using `beautiful soup`

We can use the `BeautifulSoup` module from `bs4` library to parse the html code which was obtained using the `requests` library.

The library can be installed using `pip`.

In [12]:
!pip install beautifulsoup4 --upgrade --quiet

In [13]:
from bs4 import BeautifulSoup

To parse HTML contents of a webpage, we can pass the HTML contents to the `BeautifulSoup` class along with indication of `html parser` which returns a bs4 object. 

In [14]:
doc = BeautifulSoup(page_contents,'html.parser')

In [15]:
doc.title

<title>Jobs: Search Jobs In India, Freshers Jobs Online, Govt Jobs, Recruitment | Freshersworld.com</title>

We can now combine this step and the previous step to write a function that takes the blog_search_url variable as an argument which returns a `bs4` object which can be later used for scraping the required information.

Let's see the title of the `BeautifulSoup` doc using the webpage URL.

In [16]:
def get_pages(url):
    """Download a webpage and return a BeautifulSoup doc"""
    #Download the webpage
    response = requests.get(url)
    
    #Check if the download was successful
    if response.status_code != 200:
        raise Exception('Unable to download page {}'.format(url))
    
    #Get the page HTML
    html_contents = response.text
    
    #Create a bs4 doc
    doc = BeautifulSoup(html_contents,'html.parser')
    
    return doc

Now we will call the function `get_pages` by passing the required URL as argument and verify the title of the webpage.

In [17]:
doc = get_pages(job_search_url)
doc.title

<title>Jobs: Search Jobs In India, Freshers Jobs Online, Govt Jobs, Recruitment | Freshersworld.com</title>

From the function output and then printing the title of the `bs4` object, we can confirm the function has the same usage as the 1st and 2nd steps which we wrote before.

In this section, we have successfully used the `BeautifulSoup` module from the `bs4` library to parse the HTML file.

## Extract company name, job location, apply link, required qualification, experience required, last date from the webpages

Let's extract the details of 'Data Scientist' job and then we can extend it to obtain the CSV file information for other job profiles by writing a function.

But to extract the the details of job profiles, we need the URLs to navigate to a particular job profile. 

Let's create a function `get_job_profile` to get the url where we want to navigate for a particular job profile.

In [18]:
def get_job_profile(job_title):
    job_title.replace(' ','-')
    
    # Construct the URL
    job_title_url = 'https://www.freshersworld.com/jobs/jobsearch/' + job_title +'-jobs'
    
    # Get the HTML page content using requests
    response = requests.get(job_title_url)
    
    # Ensure that the reponse is valid
    if response.status_code != 200:
    #OR: if not response.ok:
        print('Status code:', response.status_code)
        raise Exception('Failed to fetch web page ' + job_title_url)
    
    # Construct a beautiful soup document
    doc = BeautifulSoup(response.text)
    
    return doc

In [19]:
doc = get_job_profile('Data Analyst')

In [20]:
doc.title.text

'Data%20analyst jobs - Data%20analyst Recruitment 2019'

Getting the list of jobs for another job title is now as simple as invoking the function with a different argument.

In [21]:
doc2 = get_job_profile('Data Scientist')

In [22]:
doc2.title.text

'Data%20scientist jobs - Data%20scientist Recruitment 2019'

![](https://i.imgur.com/M4mYFaN.png)

Upon inspecting the box containing the information for a job, you will find a `div` tag for each job post, with `class` attribute set to `col-md-12 col-lg-12 col-xs-12 padding-none job-container jobs-on-hover top_space`.

Let's find all the `div` tags matching this `class`.

In [23]:
div_tags = doc.find_all('div', class_='col-md-12 col-lg-12 col-xs-12 padding-none job-container jobs-on-hover top_space')

In [24]:
len(div_tags)

4

It looks like we've 11 job listed matching the `class` name. 

We need to extract the following information from each tag:

1. Company name
2. Experience required
3. Educational qualification
4. Last date to apply
5. Job location
6. Link to apply

Look at the source of any of the div tags. You will notice that the company name, job location and link to apply are all part of an `a` tag.

In [25]:
#There are 11 tags and we will look at one of them
div_tag = div_tags[0]

In [26]:
div_tag

<div class="col-md-12 col-lg-12 col-xs-12 padding-none job-container jobs-on-hover top_space"><div class="text-ago"><span class="job_posted_on">Posted on : </span><span class="ago-text">21 days ago</span></div><div class="col-md-12 col-lg-12 col-xs-12" style="margin-bottom: 2%;"><div class="col-md-12 col-xs-12 col-lg-12"><div class="col-md-12 col-xs-12 col-lg-12 padding-none left_move_up" style="margin-top: -17px;"><a href="https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559" itemprop="url" target="_blank" title="CAW Studios Recruitments"><h3 class="latest-jobs-title font-16 margin-none inline-block" itemprop="name">CAW Studios</h3></a><div>Data Analyst</div><div class="qualification-block" style="padding-top: 12px"><span class="pull-left" title="Education Qualification"><svg class="icon-16-16" viewbox="0 0 16 16"><use xlink:href="#icon-qualification-svg" xmlns:xlink="https://www.w3.org/1999/xlink"></use></svg></span><sp

To get the Job role we've to extract the 1st `div` tag inside the `class` `col-md-12 col-xs-12 col-lg-12 padding-none left_move_up`

In [27]:
role_tag=div_tag.find('div',class_='col-md-12 col-xs-12 col-lg-12 padding-none left_move_up')
role_tag.find('div').text

'Data Analyst'

We can see `a` tags contains few details lets extract all the `a` tags.  


In [28]:
a_tags = div_tag.find_all('a')
a_tags

[<a href="https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559" itemprop="url" target="_blank" title="CAW Studios Recruitments"><h3 class="latest-jobs-title font-16 margin-none inline-block" itemprop="name">CAW Studios</h3></a>,
 <a class="bold_font" href="/jobs-in-hyderabad/999903705">Hyderabad</a>,
 <a class="bold_font" href="/jobs-in-bangalore/9999016065">Bangalore</a>,
 <a class="view-apply-button view-apply-button-1667559" href="https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559" target="_blank">View &amp; Apply</a>]

We found company name, job location, link to apply. 

Now lets extract those using `a` tags.

In [29]:
company_name = a_tags[0].text
company_name

'CAW Studios'

In [30]:
apply_link = a_tags[-1]['href']
apply_link

'https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559'

In [31]:
job_location = div_tag.find_all('a',class_ ='bold_font')
job_location

[<a class="bold_font" href="/jobs-in-hyderabad/999903705">Hyderabad</a>,
 <a class="bold_font" href="/jobs-in-bangalore/9999016065">Bangalore</a>]

We can see there are 2 job locations for this job.

let's create a function to get these 2 locations as a single string seperated by `/`.

In [32]:
def get_loc(loc_list):
    loc=''
    for x in range(len(loc_list)):
        loc+=loc_list[x].text
        if x == len(loc_list)-1:
            break
        loc+='/'
    return loc

In [33]:
get_loc(job_location)

'Hyderabad/Bangalore'

Now let's extract remaining details which we can find in `span` tag

In [34]:
span_tags= div_tag.find_all('span')
span_tags

[<span class="job_posted_on">Posted on : </span>,
 <span class="ago-text">21 days ago</span>,
 <span class="pull-left" title="Education Qualification"><svg class="icon-16-16" viewbox="0 0 16 16"><use xlink:href="#icon-qualification-svg" xmlns:xlink="https://www.w3.org/1999/xlink"></use></svg></span>,
 <span class="qualifications display-block modal-open" itemprop="qualifications"><span class="bold_elig">BE/B.Tech</span>, <span class="bold_elig">ME/M.Tech</span>, <span class="bold_elig">CS</span>, <span class="bold_elig">MS</span></span>,
 <span class="bold_elig">BE/B.Tech</span>,
 <span class="bold_elig">ME/M.Tech</span>,
 <span class="bold_elig">CS</span>,
 <span class="bold_elig">MS</span>,
 <span class="desc_title">Job Description : </span>,
 <span class="desc"> Apply to Data Analyst Jobs in CAW Studios, Bangalore, Hyderabad from 2 year experience. Find part &amp; full time, work from home job opportunit...</span>,
 <span class="pull-left" title="Job Location"><svg class="icon-16-16

In [35]:
edu_qualification= div_tag.find_all('span',class_ ='bold_elig')
edu_qualification

[<span class="bold_elig">BE/B.Tech</span>,
 <span class="bold_elig">ME/M.Tech</span>,
 <span class="bold_elig">CS</span>,
 <span class="bold_elig">MS</span>]

There are 4 required education qualification.

let's create a function to get these as a single string seperated by /.

In [36]:
def all_qualification(span_list):
    edu_list=''
    for x in range(len(span_list)):
        edu_list+=span_list[x].text
        if x == len(span_list)-1:
            break
        edu_list+='/'
    return edu_list    
all_qualification(edu_qualification)
    

'BE/B.Tech/ME/M.Tech/CS/MS'

In [37]:
exp = div_tag.find_all('span',class_ ='experience')[0].text
exp

'2 Years'

In [38]:
last_day = div_tag.find_all('span',class_ ='padding-left-4')[0].text
last_day

'26 Feb 23'

Now let's define a function `parse_job_details` which can take `div` tag of the page and extract all the job details and then return the details as a dictionary.

In [39]:
def parse_job_details(div_tag):
    #Job Role
    role_tag=div_tag.find('div',class_='col-md-12 col-xs-12 col-lg-12 padding-none left_move_up')
    job_role=role_tag.find('div').text
    #All a tags
    a_tags = div_tag.find_all('a')
    # Company name
    comp_name = a_tags[0].text
    # Job location
    job_location = div_tag.find_all('a',class_ ='bold_font')
    loc = get_loc(job_location)
    # Link to apply URL
    apply_link = a_tags[-1]['href']
    # Required educational qualification
    edu_qualification= div_tag.find_all('span',class_ ='bold_elig')
    req_quali = all_qualification(edu_qualification)
    # Experience required
    exp = div_tag.find_all('span',class_ ='experience')[0].text
    # Last day to apply
    last_date = div_tag.find_all('span',class_ ='padding-left-4')[0].text
    
    
    # Return a dictionary
    return {
        'job role': job_role,
        'company name': comp_name,
        'job location': loc,        
        'required qualification': req_quali,
        'experience required': exp,
        'last date': last_date,
        'apply link': apply_link
    }

We can now use the function to parse any div tag.

In [40]:
parse_job_details(div_tags[0]) 

{'job role': 'Data Analyst',
 'company name': 'CAW Studios',
 'job location': 'Hyderabad/Bangalore',
 'required qualification': 'BE/B.Tech/ME/M.Tech/CS/MS',
 'experience required': '2 Years',
 'last date': '26 Feb 23',
 'apply link': 'https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559'}

We can use a list comprehension to parse all the `div` tags in one go.

In [41]:
jobs = [parse_job_details(x) for x in div_tags]
jobs

[{'job role': 'Data Analyst',
  'company name': 'CAW Studios',
  'job location': 'Hyderabad/Bangalore',
  'required qualification': 'BE/B.Tech/ME/M.Tech/CS/MS',
  'experience required': '2 Years',
  'last date': '26 Feb 23',
  'apply link': 'https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559'},
 {'job role': 'Data Analyst',
  'company name': 'AMC Career',
  'job location': 'Pune',
  'required qualification': 'BE/B.Tech',
  'experience required': '0 Years',
  'last date': '17 Mar 23',
  'apply link': 'https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-amc-career-at-akurdi-magarpatta-pune-1674315'},
 {'job role': 'Data Analyst',
  'company name': 'Lets Viz Technologies',
  'job location': 'Delhi',
  'required qualification': 'BCA/BE/B.Tech/BSc',
  'experience required': '0 Years',
  'last date': '12 Feb 23',
  'apply link': 'https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-lets-viz-technologies

## Compile extracted information using Python lists

Now let's a function that takes a job title and returns a list of dictionaries containing details of the job.

In [42]:
def get_job_details(job_title):
    doc = get_job_profile(job_title)
    div_tags = doc.find_all('div', class_='col-md-12 col-lg-12 col-xs-12 padding-none job-container jobs-on-hover top_space')
    jobs = [parse_job_details(tags) for tags in div_tags]
    return jobs

We can now use the functions we've defined to get the details of the jobs listed.

In [43]:
jobs = get_job_details('Digital Marketing')
len(jobs)

30

Let's try with different job title.

In [44]:
jobs = get_job_details('Data Scientist')
jobs

[{'job role': 'Data Scientist',
  'company name': 'Atlassian',
  'job location': 'Bangalore',
  'required qualification': 'BCA/BE/B.Tech/MCA/ME/M.Tech',
  'experience required': '0 to 3+ Years',
  'last date': '22 Jan 23',
  'apply link': 'https://www.freshersworld.com/jobs/data-scientist-jobs-opening-in-atlassian-at-electronic-city-bangalore-1634035'},
 {'job role': 'Data Scientist',
  'company name': 'Product company',
  'job location': 'Noida',
  'required qualification': 'BE/B.Tech/CS/Other Graduate',
  'experience required': '0 Years',
  'last date': '18 Mar 23',
  'apply link': 'https://www.freshersworld.com/jobs/data-scientist-jobs-opening-in-product-company-at-dlf-phase-4-noida-1674496'},
 {'job role': 'Data Scientist',
  'company name': 'DEEPIJA TELECOM PRIVATE LIMITED',
  'job location': 'Hyderabad',
  'required qualification': 'BE/B.Tech',
  'experience required': '0 Years',
  'last date': '28 Jan 23',
  'apply link': 'https://www.freshersworld.com/jobs/data-scientist-jobs-o

## Save the extracted information to a CSV file.

Let's create a function which takes a list of dictionaries and writes them to a CSV file.

In [45]:
def write_csv(items, path):
    # Open the file in write mode
    with open(path, 'w') as f:
        # Return if there's nothing to write
        if len(items) == 0:
            return
        
        # Write the headers in the first line
        headers = list(items[0].keys())
        f.write(','.join(headers) + '\n')
        
        # Write one item per line
        for item in items:
            values = []
            for header in headers:
                values.append(str(item.get(header, "")))
            f.write(','.join(values) + "\n")

Let's write the data stored in `jobs` into a CSV file.

In [46]:
write_csv(jobs, 'Data Scientist.csv')

We can now read the file and inspect its contents. The contents of the file can also be inspected using the "File > Open" menu option within Jupyter.

In [47]:
with open('Data Scientist.csv', 'r') as f:
    print(f.read())

job role,company name,job location,required qualification,experience required,last date,apply link
Data Scientist,Atlassian,Bangalore,BCA/BE/B.Tech/MCA/ME/M.Tech,0 to 3+ Years,22 Jan 23,https://www.freshersworld.com/jobs/data-scientist-jobs-opening-in-atlassian-at-electronic-city-bangalore-1634035
Data Scientist,Product company,Noida,BE/B.Tech/CS/Other Graduate,0 Years,18 Mar 23,https://www.freshersworld.com/jobs/data-scientist-jobs-opening-in-product-company-at-dlf-phase-4-noida-1674496
Data Scientist,DEEPIJA TELECOM PRIVATE LIMITED,Hyderabad,BE/B.Tech,0 Years,28 Jan 23,https://www.freshersworld.com/jobs/data-scientist-jobs-opening-in-deepija-telecom-private-limited-at-madhapur-hyderabad-1640089
Scientist,Vedam India Metallurgical Research Laboratory Private Limited,Kolkata,M Phil / Ph.D,0 Years,18 Feb 23,https://www.freshersworld.com/jobs/scientist-jobs-opening-in-vedam-india-metallurgical-research-laboratory-private-limited-at-jadavpur-kolkata-1659750
Data Scientist Internship,A2B P

Let's create a function which takes job title and returns csv file path

In [48]:
def scrape_job(job_title, path=None):
    if path is None:
        path = job_title + '.csv'
    jobs = get_job_details(job_title)
    write_csv(jobs, path)
    print('Jobs for "{}" written to file "{}"'.format(job_title, path))
    return path

In [49]:
scrape_job('Data Analyst')

Jobs for "Data Analyst" written to file "Data Analyst.csv"


'Data Analyst.csv'

Now that we have a CSV file, we can use the `pandas` library to view its contents.

In [50]:
import pandas as pd

In [51]:
pd.read_csv('Data Analyst.csv')

Unnamed: 0,job role,company name,job location,required qualification,experience required,last date,apply link
0,Data Analyst,CAW Studios,Hyderabad/Bangalore,BE/B.Tech/ME/M.Tech/CS/MS,2 Years,26 Feb 23,https://www.freshersworld.com/jobs/data-analys...
1,Data Analyst,AMC Career,Pune,BE/B.Tech,0 Years,17 Mar 23,https://www.freshersworld.com/jobs/data-analys...
2,Data Analyst,Lets Viz Technologies,Delhi,BCA/BE/B.Tech/BSc,0 Years,12 Feb 23,https://www.freshersworld.com/jobs/data-analys...
3,IT Business Analyst,Yes IT Lab,Delhi/Noida,Certificate Course (ITI)/BCA/BE/B.Tech/MBA/PGD...,0 Years,06 Mar 23,https://www.freshersworld.com/jobs/it-business...


Let's try with different job profiles

In [52]:
scrape_job('Software Developer')

Jobs for "Software Developer" written to file "Software Developer.csv"


'Software Developer.csv'

In [53]:
pd.read_csv('Software Developer.csv')

Unnamed: 0,job role,company name,job location,required qualification,experience required,last date,apply link
0,Software Developer,AMC Career,Pune,BE/B.Tech,0 Years,18 Mar 23,https://www.freshersworld.com/jobs/software-de...
1,Software Developer,Corporate Resources,Bangalore,BE/B.Tech,0 to 3+ Years,04 Mar 23,https://www.freshersworld.com/jobs/software-de...
2,Software Developer,CloudEQ Software Private Limited,Chandigarh,BCA/BE/B.Tech/BSc/MCA/ME/M.Tech/CS/MS/MSc,0 Years,27 Feb 23,https://www.freshersworld.com/jobs/software-de...
3,Software Developer,VoiceSnap Services Pvt Ltd,Chennai,BCA/BE/B.Tech/BSc/CS,2 Years,24 Feb 23,https://www.freshersworld.com/jobs/software-de...
4,Software Developer,Zanthium Technosoft Pvt Ltd,Noida/Rudrapur,BCA/BE/B.Tech/MCA/ME/M.Tech,0 Years,11 Feb 23,https://www.freshersworld.com/jobs/software-de...
5,Software Developer,Money Honey Financial Service Pvt Ltd,Mumbai,BE/B.Tech/BSc,0 Years,13 Feb 23,https://www.freshersworld.com/jobs/software-de...
6,Software Developer,Alchemy Solutions,Bangalore,BCA/BE/B.Tech/MCA,0 Years,12 Feb 23,https://www.freshersworld.com/jobs/software-de...
7,Software Developer,Forzo TechLabs,Chennai,BCA/BE/B.Tech/BSc/MCA/MSc,1 to 2 Years,05 Feb 23,https://www.freshersworld.com/jobs/software-de...
8,Software Developer,Magneqsoftware,Hyderabad,BE/B.Tech/BSc/MCA/ME/M.Tech,0 to 0.6 Years,01 Feb 23,https://www.freshersworld.com/jobs/software-de...
9,Software Developer,IESoft Technologies Private Limited,,BCA/BE/B.Tech/MCA/ME/M.Tech,0 Years,17 Feb 23,https://www.freshersworld.com/jobs/company/ies...


Let's create a data for top 5 job openings with high demand

In [54]:
top_jobs_list=['Data Analyst','Data Scientist','Software Developer','Digital Marketing','Customer Service']
csv_files=[scrape_job(jobs)for jobs in top_jobs_list]

Jobs for "Data Analyst" written to file "Data Analyst.csv"
Jobs for "Data Scientist" written to file "Data Scientist.csv"
Jobs for "Software Developer" written to file "Software Developer.csv"
Jobs for "Digital Marketing" written to file "Digital Marketing.csv"
Jobs for "Customer Service" written to file "Customer Service.csv"


Now we'll combine all the files in a single csv file 

In [55]:
df = pd.DataFrame()
#append all files together
for file in csv_files:
            df_temp = pd.read_csv(file)
            df = df.append(df_temp, ignore_index=True)
df.to_csv('Top Jobs.csv',index=False)

In [56]:
pd.read_csv('Top Jobs.csv')

Unnamed: 0,job role,company name,job location,required qualification,experience required,last date,apply link
0,Data Analyst,CAW Studios,Hyderabad/Bangalore,BE/B.Tech/ME/M.Tech/CS/MS,2 Years,26 Feb 23,https://www.freshersworld.com/jobs/data-analys...
1,Data Analyst,AMC Career,Pune,BE/B.Tech,0 Years,17 Mar 23,https://www.freshersworld.com/jobs/data-analys...
2,Data Analyst,Lets Viz Technologies,Delhi,BCA/BE/B.Tech/BSc,0 Years,12 Feb 23,https://www.freshersworld.com/jobs/data-analys...
3,IT Business Analyst,Yes IT Lab,Delhi/Noida,Certificate Course (ITI)/BCA/BE/B.Tech/MBA/PGD...,0 Years,06 Mar 23,https://www.freshersworld.com/jobs/it-business...
4,Data Scientist,Atlassian,Bangalore,BCA/BE/B.Tech/MCA/ME/M.Tech,0 to 3+ Years,22 Jan 23,https://www.freshersworld.com/jobs/data-scient...
...,...,...,...,...,...,...,...
95,Customer Support Service,Client Of Teamlease Service Ltd,Mumbai,Diploma/B.Com/BA/MA/BBA/BBM/BSc/12th Pass (HSE),0 Years,16 Feb 23,https://www.freshersworld.com/jobs/customer-su...
96,Customer Support Service,Client Of Teamlease Service Ltd,Mumbai,Diploma/B.Com/BA/MA/BBA/BBM/BSc/12th Pass (HSE),0 Years,16 Feb 23,https://www.freshersworld.com/jobs/customer-su...
97,Customer Support Service,Client Of Teamlease Service Ltd,Mumbai,Diploma/B.Com/BA/MA/BBA/BBM/BSc/12th Pass (HSE),0 Years,16 Feb 23,https://www.freshersworld.com/jobs/customer-su...
98,Customer Support Service,Client Of Teamlease Service Ltd,Mumbai,Diploma/B.Com/BA/MA/BBA/BBM/BSc/12th Pass (HSE),0 Years,16 Feb 23,https://www.freshersworld.com/jobs/customer-su...


## Summary

Here is what we covered in this notebook:

1. Download the webpage using `requests`
2. Parse the HTML source code using `beautiful soup`
3. Extract Blog name, author name, published date and blog URLs from webpage
4. Compile extracted information using Python lists
5. Save the extracted information to a CSV file.

The CSV file which will be created will have the following format:
```
  job role,company name,job location,required qualification,experience required,last date,apply link
 Data Analyst,CAW Studios,Hyderabad/Bangalore,BE/B.Tech/ME/M.Tech/CS/MS,2 Years,26 Feb 23,https://www.freshersworld.com/jobs/data-analyst-jobs-opening-in-caw-studios-at-gachibowli-bangalore-hyderabad-1667559
 ....
```

## Future Work

* We can fetch the details about companies which are posting jobs in the portal.
* We can fetch the details about which job profile have most number of openings.
* we can analyze which skillset are asked in most jobs.

## References

1. https://requests.readthedocs.io/en/latest/
2. https://www.crummy.com/software/BeautifulSoup/bs4/doc/
3. https://www.freshersworld.com/
4. https://stackoverflow.com/questions/2136267/beautiful-soup-and-extracting-a-div-and-its-contents-by-id

In [58]:
jovian.commit(project = project_name, files = ['Top Jobs.csv'])

<IPython.core.display.Javascript object>

[jovian] Updating notebook "yatinsammal/scraping-job-portal" on https://jovian.com[0m
[jovian] Uploading additional files...[0m
[jovian] Committed successfully! https://jovian.com/yatinsammal/scraping-job-portal[0m


'https://jovian.com/yatinsammal/scraping-job-portal'