# 如何成為資料分析師：Yourator Scraper 2023

> https://hahow.in/cr/dajourney

郭耀仁 <yaojenkuo@datainpoint.com>

In [1]:
import os
import json
import pandas as pd
import requests
from bs4 import BeautifulSoup

## 定義函數擷取搜尋結果的職缺連結

```python
def get_job_urls() -> list:
    with open("yourator/jobs.json") as file:
        jobs_json = json.load(file)
    job_urls = [item["thirdPartyUrl"] if item["thirdPartyUrl"] is not None else "https://www.yourator.co" + item["path"] for item in jobs_json["payload"]["jobs"]]
    return job_urls
```

## 定義函數下載職缺連結的網頁檔案

```python
def download_job_descriptions(job_urls: list):
    for job_url in job_urls:
        r = requests.get(job_url)
        soup = BeautifulSoup(r.text)
        page_name = job_url.replace("/", "_")
        with open(f"yourator/job_descriptions/{page_name}.html", "w") as file:
            file.write(r.text)
```

## 依序執行函數

```python
job_urls = get_job_urls()
download_job_descriptions(job_urls)
```

## 擷取職缺的工作敘述

In [2]:
list_dir = os.listdir("yourator/job_descriptions/")
job_titles, employers, job_descriptions = [], [], []
for html_file in list_dir:
    with open(f"yourator/job_descriptions/{html_file}") as file:
        soup = BeautifulSoup(file, 'html.parser')
    if "teamdoor" in html_file:
        job_title = soup.select("h2.title")[0].text.strip()
        employer = html_file.split("__")[1]
        employer = employer.split(".")[0]
        job_description = [elem.text for elem in soup.select("div.content-area.content > div > div > div > p")]
        job_description = " ".join(job_description)
        job_titles.append(job_title)
        employers.append(employer)
        job_descriptions.append(job_description)
    else:
        job_title = soup.select("h1.basic-info__title__text")[0].text
        employer = soup.select("h4 > a")[0].text
        job_description = [elem.text for elem in soup.select("div > section > p")]
        job_description = " ".join(job_description)
        job_titles.append(job_title)
        employers.append(employer)
        job_descriptions.append(job_description)
df = pd.DataFrame()
df["employer"] = employers
df["job_title"] = job_titles
df["job_description"] = job_descriptions

In [3]:
df

Unnamed: 0,employer,job_title,job_description
0,viewsonic,"資深數據分析師, Senior Data Analyst, myViewBoard","1. Proficiency in Python (must), SQL or other ..."
1,國泰世華商業銀行,"【數位金融】數位分析 Data Analyst(數數發中心, DDT)",【職務說明 What will you do】 1. 撰寫 SQL 分析資料，使用Table...
2,KdanMobile,(台北) 資料分析師 Data Analyst,【你需要做什麼？】• 行銷名單彙整管理• 產品數據收集參數設定與分析• 用戶行為與產品改善分...
3,統一超商 7-ELEVEN,統一超商 數據分析師 Data Analyst (OPEN POINT),福儲信託：員工提撥金額公司再加碼，協助你累積另一筆退休金\n年節賀禮：1. 三節禮金 / 禮...
4,Gogolook,資深數據分析師 Senior Data Analyst,Job description Gogolook has been deeply invol...
5,國泰世華商業銀行,"【數位金融】數位分析 Data Analyst(數數發中心, DDT)",【職務說明 What will you do】\n1. 透過數據驅動建立重點業務指標，運用工...
6,JobMenta,全球千萬用戶AI應用軟體科技公司 Product Director,
7,天下雜誌股份有限公司,《天下雜誌群》資深數據分析師 Sr.Data Analyst,天下雜誌群共有《天下》、《康健》、《天下學習》、《親子天下》等品牌，分屬在財經管理、健康生活...
8,KdanMobile,(台北) ADNEX資料分析師 Data Analyst,【關於ADNEX】 ADNEX是隸屬於凱鈿的數位廣告行銷品牌，我們有投放經驗很豐富的優化師、...
9,新加坡商鈦坦科技,【採線上面談】Business Analyst 商業分析師(台北),【Job Responsibilities】\n• Conduct data analysi...
