# Web Scraping
After pulling all the Glassdoor reviews about Typeform, if you want to join us the next step would be to actually apply for one of the jobs :)

So, in this part we will:

1. Analyze what positions Typeform is currently hiring for, pulling the data directly from the [career site](https://www.typeform.com/careers/).

1. Once we have the data, count how many jobs are open right now, and

1. Look for the team with more job openings.

## Libraries

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Scraping

In [2]:
url = 'https://www.typeform.com/careers/'

In [3]:
response = requests.get(url)
print(response.status_code)
html = response.text
soup = BeautifulSoup(html, 'html.parser')

200


In [4]:
job_posts = soup.findAll('li', attrs={'class':'styled-components__JobItem-sc-1gmvx66-1 kIeacp'})

In [5]:
jobs_dict = [{
    'Position': job.find('span', attrs={'class':'styled-components__JobName-sc-1gmvx66-4 iYySLG'}).text.strip(),
    'Team': job.find('img', alt=True)['alt'].strip(),
    'Link': 'https://www.typeform.com' + job.find('a')['href'].strip()
} for job in job_posts]

## Create DataFrame

In [6]:
jobs_df = pd.DataFrame(jobs_dict)
jobs_df

Unnamed: 0,Position,Team,Link
0,Account Executive (US remote),Business Development,https://www.typeform.com/careers/jobs/2544011/
1,Build & Release Manager,Engineering,https://www.typeform.com/careers/jobs/2951809/
2,Community Support Advocate,Customer Success,https://www.typeform.com/careers/jobs/3029880/
3,Corporate FP&A Manager,Finance & Legal,https://www.typeform.com/careers/jobs/3032701/
4,Customer Success Manager (US Remote),Customer Success,https://www.typeform.com/careers/jobs/3033493/
5,Customer Support Advocate (VideoAsk),Customer Success,https://www.typeform.com/careers/jobs/2945097/
6,Data Devops,Data & Analytics,https://www.typeform.com/careers/jobs/2871412/
7,Data Warehouse Architect,Data & Analytics,https://www.typeform.com/careers/jobs/2995743/
8,Director of Strategy & Organization,Strategy,https://www.typeform.com/careers/jobs/3033537/
9,Engineering Manager,Engineering,https://www.typeform.com/careers/jobs/2450368/


### Sort DataFrame

In [7]:
jobs_df.sort_values(by='Position', inplace=True)
jobs_df.sort_values(by='Team', inplace=True)
jobs_df.reset_index(inplace=True)
jobs_df.drop(columns='index', inplace=True)
jobs_df

Unnamed: 0,Position,Team,Link
0,Account Executive (US remote),Business Development,https://www.typeform.com/careers/jobs/2544011/
1,Inbound SDR (Remote US),Business Development,https://www.typeform.com/careers/jobs/2764213/
2,Customer Support Advocate (VideoAsk),Customer Success,https://www.typeform.com/careers/jobs/2945097/
3,Customer Success Manager (US Remote),Customer Success,https://www.typeform.com/careers/jobs/3033493/
4,Video Learning Production Specialist,Customer Success,https://www.typeform.com/careers/jobs/2689018/
5,Community Support Advocate,Customer Success,https://www.typeform.com/careers/jobs/3029880/
6,Data Devops,Data & Analytics,https://www.typeform.com/careers/jobs/2871412/
7,Data Warehouse Architect,Data & Analytics,https://www.typeform.com/careers/jobs/2995743/
8,Senior Machine Learning Specialist,Data & Analytics,https://www.typeform.com/careers/jobs/2551597/
9,Senior Data Scientist - Customer Success,Data & Analytics,https://www.typeform.com/careers/jobs/2699798/


## Analysis

### Total job openings

In [8]:
total = jobs_df['Position'].value_counts().sum()
print(f'There are {total} job openings at Typeform right now.')

There are 38 job openings at Typeform right now.


### Teams with more job openings

In [9]:
team1_name = jobs_df['Team'].value_counts().nlargest(1).index[0]
team1_number = jobs_df['Team'].value_counts()[0]
team2_name = jobs_df['Team'].value_counts().nlargest(2).index[1]
team2_number = jobs_df['Team'].value_counts()[1]
print(f'The team with more job openings is "{team1_name}" with {team1_number} open positions, followed by "{team2_name}" with {team2_number} open positions.')

The team with more job openings is "Engineering" with 8 open positions, followed by "Data & Analytics" with 6 open positions.


## Export

In [10]:
jobs_df.to_csv('2-job-openings.csv', sep=',', index=False)