## 데이터 수집 : 시간별 기상 데이터 크롤링
---
* 출처 : [항공기상청](https://amo.kma.go.kr/weather/stat/stat-hourly.do)
* 수집 범위 : 2017년 1월 ~ 2022년 2월
* 방법 : pandas를 이용한 크롤링

#### *크롤링 할 페이지는 정적 페이지, 웹 주소에 있는 속성 분석 필요
---
![링크 분석](항공편 결항 및 지연 분석/기상 상관관계 분석/link.png)

#### 공항 코드
|공항|코드|
|-|-|
|인천공항|RKSI|
|김포공항|RKSS|
|제주공항|RKPC|
|무안공항|RKJB|
|울산공항|RKPU|
|여수공항|RKJY|
|양양공항|RKNY|

#### 날씨 요소 속성
|속성 번호|속성|
|:-:|:-:|
|0|풍향|
|1|풍속|
|2|시정|
|3|전운량|
|4|최저운고|
|5|기온|
|6|강수량|
|7|적설|
|8|신적설|


In [2]:
import pandas as pd
 
url1 = "https://amo.kma.go.kr/weather/stat/stat-hourly.do?stnCd="
airports = ['RKSI', 'RKSS', 'RKPC', 'RKJB', 'RKPU', 'RKJY', 'RKNY']
url2 = "&year="
years = range(2017, 2022)
url3 = "&month="
months = range(1, 13)
url4 = "&ele="
elements = range(9)
col = ['month', 'day', '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13',
                                            '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24']

for a in airports:
    for e in elements:
        df = pd.DataFrame()
        for y in years:
            for m in months:
                # URL로 데이터 읽어오기
                df1 = pd.read_html(url1 + a + url2 + str(y) + url3 + str(m) + url4 + str(e))[0]
                # month 컬럼 추가(값 : 해당 달)
                df1['month'] = m
                # 컬럼 이름 변경 : '날짜' -> 'day'
                df1 = df1.rename(columns = {'시간날짜' : 'day'})
                # 컬럼 순서 재배열
                df1 = df1.reindex(columns = col)
                # 기존 데이터프레임에 병합
                df = pd.concat([df, df1])
            df.to_csv("C:/Data/weather/" + a + "/" + str(e) + "/" + str(y) + ".csv")
            print(f"file : C:/Data/weather/{a}/{str(e)}/{str(y)}.csv")

file : C:/Data/weather/RKSI/0/2017.csv
file : C:/Data/weather/RKSI/0/2018.csv
file : C:/Data/weather/RKSI/0/2019.csv
file : C:/Data/weather/RKSI/0/2020.csv
file : C:/Data/weather/RKSI/0/2021.csv
file : C:/Data/weather/RKSI/1/2017.csv
file : C:/Data/weather/RKSI/1/2018.csv
file : C:/Data/weather/RKSI/1/2019.csv
file : C:/Data/weather/RKSI/1/2020.csv
file : C:/Data/weather/RKSI/1/2021.csv
file : C:/Data/weather/RKSI/2/2017.csv
file : C:/Data/weather/RKSI/2/2018.csv
file : C:/Data/weather/RKSI/2/2019.csv
file : C:/Data/weather/RKSI/2/2020.csv
file : C:/Data/weather/RKSI/2/2021.csv
file : C:/Data/weather/RKSI/3/2017.csv
file : C:/Data/weather/RKSI/3/2018.csv
file : C:/Data/weather/RKSI/3/2019.csv
file : C:/Data/weather/RKSI/3/2020.csv
file : C:/Data/weather/RKSI/3/2021.csv
file : C:/Data/weather/RKSI/4/2017.csv
file : C:/Data/weather/RKSI/4/2018.csv
file : C:/Data/weather/RKSI/4/2019.csv
file : C:/Data/weather/RKSI/4/2020.csv
file : C:/Data/weather/RKSI/4/2021.csv
file : C:/Data/weather/RK

file : C:/Data/weather/RKPU/6/2018.csv
file : C:/Data/weather/RKPU/6/2019.csv
file : C:/Data/weather/RKPU/6/2020.csv
file : C:/Data/weather/RKPU/6/2021.csv
file : C:/Data/weather/RKPU/7/2017.csv
file : C:/Data/weather/RKPU/7/2018.csv
file : C:/Data/weather/RKPU/7/2019.csv
file : C:/Data/weather/RKPU/7/2020.csv
file : C:/Data/weather/RKPU/7/2021.csv
file : C:/Data/weather/RKPU/8/2017.csv
file : C:/Data/weather/RKPU/8/2018.csv
file : C:/Data/weather/RKPU/8/2019.csv
file : C:/Data/weather/RKPU/8/2020.csv
file : C:/Data/weather/RKPU/8/2021.csv
file : C:/Data/weather/RKJY/0/2017.csv
file : C:/Data/weather/RKJY/0/2018.csv
file : C:/Data/weather/RKJY/0/2019.csv
file : C:/Data/weather/RKJY/0/2020.csv
file : C:/Data/weather/RKJY/0/2021.csv
file : C:/Data/weather/RKJY/1/2017.csv
file : C:/Data/weather/RKJY/1/2018.csv
file : C:/Data/weather/RKJY/1/2019.csv
file : C:/Data/weather/RKJY/1/2020.csv
file : C:/Data/weather/RKJY/1/2021.csv
file : C:/Data/weather/RKJY/2/2017.csv
file : C:/Data/weather/RK