# Day25 Beautiful Soup Try Out: Stepstone Posting
# 美味的湯爬蟲初體驗：達石職缺
初次嘗試使用美味的湯爬資料，先做小一點的試試水。今天是從德國求職網站[達石](https://www.stepstone.com)來下載職缺列表，先試看看不翻頁只爬第一頁100筆職缺訊息。<br>
Today is my first try on BeautifulSoup, so the goal is to scrape 100 job posting on one page from [Stepstone](https://www.stepstone.com). This code doesn't contain page looping.<br>
![Title](2501.JPG)

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
# 指定網址 specify the url
url = "https://www.stepstone.de/5/job-search-simple.html?stf=freeText&ns=1&companyid=0&sourceofthesearchfield=resultlistpage%3Ageneral&qs=%5B%5D&ke=Junior%20Data%20Scientist&ws=Berlin&ra=10&suid=b830ebdc-e1ed-43cf-931b-006b0ad341c5&li=100&of=0&action=per_page_changed"
resp = requests.get(url)

resp.encoding = 'utf-8' # 轉換編碼至UTF-8 transform encoding to UTF-8

# 顯示網頁狀態，200即為正常 show the page status, code 200 means the page works just fine 
resp.status_code 

200

In [3]:
# 創建一個BeautifulSoup物件 create a BeautifulSoup object
soup = BeautifulSoup(resp.content, 'html.parser')

### 從網頁上用檢查看起來是以article來分別每一筆職缺訊息的，印出第一筆來看看長怎樣
After cheking the code, we found that it seems like Stepstone saves each job posting using article. Print out the first one to have a look.<br>
![Title](2503.JPG)

In [4]:
listing = soup.find_all('article')
print(listing[0])

<article class="styled__JobItemWrapper-sc-11l5pt9-0 eWDSaU" id="job-item-6008235"><div class="styled__LogoContainer-ku4vl0-1 gdRgZP"><a class="styled__LogoLink-ku4vl0-2 kxJbMr" href="/cmp/en/Fresenius-Medical-Care-Deutschland-GmbH-59398/work.html"><div class="lazyload-placeholder"></div></a></div><div class="styled__JobItemContentWrapper-sc-11l5pt9-1 eurEAB"><div class="styled__JobItemFirstLineWrapper-sc-11l5pt9-2 dwJxVv"><a class="styled__TitleLink-sc-7z1cau-0 ciAyTi" href="/jobs--Junior-Data-Scientist-m-w-d-Medical-Devices-Berlin-Fresenius-Medical-Care-Deutschland-GmbH--6008235-inline.html?suid=b830ebdc-e1ed-43cf-931b-006b0ad341c5&amp;rltr=1_1_100_dynrl_m_0_0_0" target="_blank"><h2 class="styled__TitleWrapper-sc-7z1cau-1 dPEGKL">Junior Data Scientist (m/w/d) Medical Devices</h2></a><span class="styled__SaveListingIcon-sc-19q77fm-0 jZESBb sc-cQFLBn dDWfXA" data-offerid="6008235"><svg data-container-transform="translate(0 0)" data-icon="heart-empty" height="21px" viewbox="0 0 23 20.79"

### 把每個職缺名稱存成清單。
Use .find_all() to save the job position into list. 

In [5]:
job_list = soup.find_all('h2', attrs={'class': 'styled__TitleWrapper-sc-7z1cau-1 dPEGKL'})
jobs = []
for j in job_list:
    job = j.text.strip()
    jobs.append(job)
print(jobs[0:3])

['Junior Data Scientist (m/w/d) Medical Devices', '(Junior) Data Scientist (m/w/d)', 'Physiker/ Mathematiker/ Naturwissenschaftler als Data Warehouse/ ETL-Developer (m/w/d)']


### 把每個職缺的公司存成清單。
Use .find_all() to save the company name into list. 

In [6]:
company_list = soup.find_all('div', attrs={'class': 'styled__CompanyName-iq4jvn-0 gakwWs'})
company = []
for c in company_list:
    comp = c.text.strip()
    company.append(comp)
print(company[0:3])

['Fresenius Medical Care Deutschland GmbH', '4flow', 'Senacor Technologies AG']


### 把每個職缺的位置存成清單。
Use .find_all() to save the location into list. 

In [7]:
location_list = soup.find_all('li', attrs={'class': 'job-element__body__location styled__IconElement-sc-1k0l2ot-1 jUROsL'})
location = []
for l in location_list:
    locat = l.text.strip()
    location.append(locat)
print(location[0:3])

['Berlin', 'Berlin', 'Berlin, Bonn, Frankfurt, Hamburg, Leipzig, München, Nürnberg, Stuttgart oder Wien']


### 把每個職缺的簡述存成清單。
Use .find_all() to save the short discriptions into list. 

In [8]:
a = soup.find_all('a', attrs={'class': 'styled__TextSnippetLink-sc-1xzea7b-1 styled__OneLineTextSnippetLink-sc-1xzea7b-2 bIjIzo'})
description = []
for i in a:
    des = i.find('span').text.strip()
    description.append(des)
print(description[0:3])
len(description) # 確認筆數沒有錯 check if the post amount is correct

['Unser Bewerbungsprozess läuft standardisiert in vier Stufen ab: (1) Onlinetest, (2) Telefoninterview, (3) Data-Challenge, (4) Persönliches Kennenlernen.', '4flow * Berlin * Feste Anstellung * Vollzeit - Über uns - 4flow – das sind über 600 Teammitglieder an 15 Standorten weltweit.', 'Du modellierst Datenbanken basierend auf unterschiedlichen Paradigmen (3NF, Data Vault, Star, etc.)']


100

### 把上面的清單存成字典，轉成資料框架，再存成csv檔。
Transform the lists we created above into dictionaries then into dataframe. After that, save as csv file.

In [9]:
import pandas as pd
data = {'Jobs':jobs, 'Company':company, 'Location':location, 'Description':description}
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Jobs,Company,Location,Description
0,Junior Data Scientist (m/w/d) Medical Devices,Fresenius Medical Care Deutschland GmbH,Berlin,Unser Bewerbungsprozess läuft standardisiert i...
1,(Junior) Data Scientist (m/w/d),4flow,Berlin,4flow * Berlin * Feste Anstellung * Vollzeit -...
2,Physiker/ Mathematiker/ Naturwissenschaftler a...,Senacor Technologies AG,"Berlin, Bonn, Frankfurt, Hamburg, Leipzig, Mün...",Du modellierst Datenbanken basierend auf unter...
3,Data Warehouse/ ETL-Developer (m/w/d),Senacor Technologies AG,"Berlin, Bonn, Frankfurt, Hamburg, Leipzig, Mün...",Modellierung von technischen Datenbankmodellen...
4,Junior Berater Prozess- und Datenintegration (...,itelligence AG,"Berlin, Bielefeld, Dortmund, Göttingen, Heidel...",Als Teil der NTT DATA Gruppe sind wir auf wert...


In [10]:
df.to_csv('df.csv')

文中若有錯誤還望不吝指正，感激不盡。
Please let me know if there’s any mistake in this article. Thanks for reading.

Reference 參考資料：

[1] [Tutorial: Python Web Scraping Using BeautifulSoup](https://www.dataquest.io/blog/web-scraping-tutorial-python/)

[2] [Stepstone](https://www.stepstone.de/5/job-search-simple.html?stf=freeText&ns=1&companyid=0&sourceofthesearchfield=resultlistpage%3Ageneral&qs=%5B%5D&ke=Junior%20Data%20Scientist&ws=Berlin&ra=10&suid=b830ebdc-e1ed-43cf-931b-006b0ad341c5&li=100&of=0&action=per_page_changed)
