# Web Scraping for Aviation Weather  
![](https://www.ctsys.com/wp-content/uploads/2020/06/Aviation-Weather-3.jpg)

In [1]:
!pip install jovian --upgrade --quiet

You should consider upgrading via the 'c:\users\karam\appdata\local\programs\python\python37\python.exe -m pip install --upgrade pip' command.


In [2]:
import jovian

<IPython.core.display.Javascript object>

In [3]:
jovian.commit(project="project-1-web-scraping-with-python")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..
[jovian] Error: Failed to detect notebook filename. Please provide the correct notebook filename as the "filename" argument to "jovian.commit".


## Introduction:  
In this notebook, I design a web scaping tool that downloads real-time weather data for a certain list of airports, and stores it in a csv file.  
As an air traffic controller, up-to-date and precise weather data is one of the most important informations I rely on during my duty. Those informations are also used by pilots, dispatch agents, flight service specialists and other aviation professionals.  
### Aviation weather sequences :  
The worldwide standard for aviation weather is called **METAR**. It is an ordered sequence of informations published by weather stations. Weather stations are usually based at airports. **METAR** are published every hour, in normal operations and sometimes more often, if weather conditions are changing rapidly. For more informations about **METAR** [read here](https://en.wikipedia.org/wiki/METAR).  
Weather informations are available online, in a variety of forms. The [GetMetar](https://www.getmetar.com/) website offers an easy way to get **METAR** using airports codes.  
Every airport in the world has a unique identification sequence, composed of 4 characters and assigned by [ICAO](https://www.icao.int/Pages/default.aspx). We will get a list of these identifications from a **GitHub** dataset.  
Web scraping will be done using **Beautiful Soup** library. We will use **Request** library in order to get data from [GetMetar](https://www.getmetar.com/) website.  
We will use **Pandas** library in order to get a database of airport codes, and to manipulate `csv` datasets.

In [4]:
# Installing and importing libraries 
!pip install beautifulsoup4 --upgrade --quiet
!pip install opendatasets --upgrade --quiet
import requests
from bs4 import BeautifulSoup
from tqdm.notebook import tqdm
import pandas as pd

You should consider upgrading via the 'c:\users\karam\appdata\local\programs\python\python37\python.exe -m pip install --upgrade pip' command.
You should consider upgrading via the 'c:\users\karam\appdata\local\programs\python\python37\python.exe -m pip install --upgrade pip' command.


## 1 - Downloading airport codes

In [5]:
icao_list_url = 'https://raw.githubusercontent.com/datasets/airport-codes/master/data/airport-codes.csv'

In [6]:
airports_df = pd.read_csv(icao_list_url)

Let's explore this dataframe:


In [7]:
airports_df.columns

Index(['ident', 'type', 'name', 'elevation_ft', 'continent', 'iso_country',
       'iso_region', 'municipality', 'gps_code', 'iata_code', 'local_code',
       'coordinates'],
      dtype='object')

There are many types of airports. Using the column `type` we can list these classes:

In [8]:
airports_df['type'].unique()

array(['heliport', 'small_airport', 'closed', 'seaplane_base',
       'balloonport', 'medium_airport', 'large_airport'], dtype=object)

We will consider only the 'large_airport' category. We can also specify continents, countries etc.

In [9]:
large_airports = airports_df[airports_df['type']=='large_airport']
large_airports.ident.count()

613

We have a list of 613 large airports. Now we will request weather data for one sample of those airports:

In [10]:
sample_airport = large_airports.iloc[1]
sample_airport['ident'],sample_airport['name']

('BIKF', 'Keflavik International Airport')

The first airport is our dataframe is Keflavik airport located in Island. 

## 2 - Using GetMetar website to download weather data  
First, we will retrieve weather data for our sample airport.

In [11]:
base_url = 'https://www.getmetar.com/'
sample_airport_wx = requests.get(base_url+sample_airport['ident'])

In [12]:
sample_airport_data = BeautifulSoup(sample_airport_wx.text)

By inspecting the result, we can determine where to get weather data.

In [13]:
# here is the METAR of this airport
sample_airport_data.find('h4', class_='text-white font-weight-bold').text.strip()

'BIKF 252200Z 01014KT CAVOK 05/03 Q1028'

The information is coded using the international standard. The website offers however, a more user-friendly presentation, that is available in the resulting webpage:

In [14]:
weather_list = sample_airport_data.find('ul',class_='list-group mt-4')
time_observation = weather_list.find_all('li')[0].text[27:44]
present_condition = weather_list.find_all('li')[1].text[20:].replace(',',' ')
clouds = weather_list.find_all('li')[2].text[8:].replace(',',' ')
visibility = weather_list.find_all('li')[3].text[12:]
winds = weather_list.find_all('li')[4].text[6:]
temperature = weather_list.find_all('li')[5].text[13:-8]
dew_point = weather_list.find_all('li')[6].text[11:-8]
pressure = weather_list.find_all('li')[7].text[10:]
print(time_observation)
print(present_condition)
print(clouds)
print(visibility)
print(winds)
print(temperature)
print(dew_point)
print(pressure)

25 Apr 2021 22:00
Dry
Clear skies
Greater than 10 km
10 degrees at 14 knots
5
3
1028 millibars 


Now let's define functions to execute this steps for every airport in our list. Before that, we create a list of tuples (ident, name) for every airport:

In [15]:
airport_list = [(i,j) for (i,j) in zip(large_airports['ident'],large_airports['name']) if len(i)==4] #4 is the standard ICAO airport id
airport_list  # list of airport codes and names 

 'Lovell Field'),
 ('KCHS', 'Charleston Air Force Base-International Airport'),
 ('KCID', 'The Eastern Iowa Airport'),
 ('KCLE', 'Cleveland Hopkins International Airport'),
 ('KCLT', 'Charlotte Douglas International Airport'),
 ('KCMH', 'John Glenn Columbus International Airport'),
 ('KCOS', 'City of Colorado Springs Municipal Airport'),
 ('KCRP', 'Corpus Christi International Airport'),
 ('KCRW', 'Yeager Airport'),
 ('KCVG', 'Cincinnati Northern Kentucky International Airport'),
 ('KCVS', 'Cannon Air Force Base'),
 ('KDAB', 'Daytona Beach International Airport'),
 ('KDAL', 'Dallas Love Field'),
 ('KDAY', 'James M Cox Dayton International Airport'),
 ('KDBQ', 'Dubuque Regional Airport'),
 ('KDCA', 'Ronald Reagan Washington National Airport'),
 ('KDEN', 'Denver International Airport'),
 ('KDFW', 'Dallas Fort Worth International Airport'),
 ('KDLF', 'DLF Airport'),
 ('KDLH', 'Duluth International Airport'),
 ('KDOV', 'Dover Air Force Base'),
 ('KDSM', 'Des Moines International Airport'),

This function will return a `BeautifulSoup` object with the **GetMetar** webpage data. 

In [16]:
def get_wx(airport_code):
    """
    Returns a BeautifulSoup object of the GetMetar page associated to the airport code (Ident)
    """
    base_url = 'https://www.getmetar.com/'
    airport_wx = requests.get(base_url+airport_code)
    if airport_wx.ok:
        return BeautifulSoup(airport_wx.text)
    else:
        return None

The following function will extract the weather data from the `BeautifulSoup` object.

In [17]:
def get_wx_informations(data):
    """
    Extract time, condition, clouds, visibility, winds, temperature, dew point, presure from a BeautifulSoup object
    Returns a dict of all the extracted data.
    """
    if data==None:
        return None
    else:
        weather_list = data.find('ul',class_='list-group mt-4')
        time_observation = weather_list.find_all('li')[0].text[27:44]
        present_condition = weather_list.find_all('li')[1].text[20:].replace(',',' ')
        clouds = weather_list.find_all('li')[2].text[8:].replace(',',' ')
        visibility = weather_list.find_all('li')[3].text[12:]
        winds = weather_list.find_all('li')[4].text[6:]
        temperature = weather_list.find_all('li')[5].text[13:-8]
        dew_point = weather_list.find_all('li')[6].text[11:-8]
        pressure = weather_list.find_all('li')[7].text[10:14]
        return {'time':time_observation,
               'condition':present_condition,
               'clouds':clouds,
               'visibility':visibility,
               'winds':winds,
               'temperature':temperature,
               'dew_point':dew_point,
               'pressure':pressure}

Let's test the functions for one airport. 

In [18]:
# test functions on one sample
data = get_wx(airport_list[0][0])
weather_dic = get_wx_informations(data)
weather_dic

{'time': '25 Apr 2021 22:00',
 'condition': '',
 'clouds': '',
 'visibility': 'Greater than 10 km',
 'winds': 'Calm',
 'temperature': '',
 'dew_point': '',
 'pressure': ' mil'}

## 3 - Save results to `csv` file

We have to specify headers for our csv file :

In [19]:
headers = ['ident','name']+list(weather_dic.keys())
headers

['ident',
 'name',
 'time',
 'condition',
 'clouds',
 'visibility',
 'winds',
 'temperature',
 'dew_point',
 'pressure']

In [20]:
jovian.commit()

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..
[jovian] Error: Failed to detect notebook filename. Please provide the correct notebook filename as the "filename" argument to "jovian.commit".


Downloading and saving data for more around 600 airports can take sometime. We will use `tqdm` library in order to display progress during the whole process.

In [21]:
results = [] # this list will contain the weather data for all airports
for airport in tqdm(airport_list):
    airport_ident = airport[0]
    airport_name = airport[1]
    data = get_wx(airport_ident)
    if data!=None:
        weather = list(get_wx_informations(data).values())   
        results.append([airport_ident,airport_name]+ weather)

  0%|          | 0/595 [00:00<?, ?it/s]

And finally, we save the results to `csv` file.

In [27]:
path='airport_weather.csv'
with open(path, 'w',encoding= 'utf8') as f:
        # Write the headers in the first line
        f.write(','.join(headers) + '\n')
        # Write one item per line
        for item in results:
            f.write(','.join(item) + "\n")

We open our newly-created `csv` file using pandas, and display some rows in order to verify that our functions work properly.

In [28]:
pd.read_csv('airport_weather.csv')

Unnamed: 0,ident,name,time,condition,clouds,visibility,winds,temperature,dew_point,pressure
0,AYPY,Port Moresby Jacksons International Airport,25 Apr 2021 22:00,,,Greater than 10 km,Calm,,,mil
1,BIKF,Keflavik International Airport,25 Apr 2021 22:00,Dry,Clear skies,Greater than 10 km,10 degrees at 14 knots,5.0,3.0,1028
2,BKPR,Priština International Airport,25 Apr 2021 22:00,Dry,Clear skies,Greater than 10 km,Variable at 1 knots,9.0,-1.0,1017
3,CYEG,Edmonton International Airport,25 Apr 2021 22:00,Dry,A few at 2591 meters; a few at 7620 meters,32187 meters,290 degrees at 11 knots,9.0,-12.0,29.6
4,CYHZ,Halifax / Stanfield International Airport,25 Apr 2021 22:00,Dry,Overcast sky at 3658 meters,24140 meters,140 degrees at 8 knots,10.0,4.0,29.8
...,...,...,...,...,...,...,...,...,...,...
571,ZUUU,Chengdu Shuangliu International Airport,25 Apr 2021 22:00,Dry,Broken sky at 1006 meters,Greater than 10 km,50 degrees at 2 knots,17.0,14.0,1017
572,ZWWW,Ürümqi Diwopu International Airport,25 Apr 2021 22:30,Dry,Clear skies,Greater than 10 km,150 degrees at 1 knots,4.0,-9.0,1026
573,ZYHB,Taiping Airport,25 Apr 2021 22:00,Dry,Clear skies,Greater than 10 km,130 degrees at 2 knots,5.0,-3.0,1019
574,ZYTL,Zhoushuizi Airport,25 Apr 2021 22:30,Dry,No significant clouds are observed,8000 meters,280 degrees at 1 knots,9.0,4.0,1018


Let's commit our work, including the `csv` file.

In [None]:
jovian.commit(files=['airport_weather.csv'])

# Summary  
This notebook describes the process of downloading and storing weather data for a set of airports.  
- First we downloaded a list of airport codes, using a dataframe from Github.  
- The second step, was to request weather data for each airport in our list using `requests`
- Using `BeautifulSoup` library, we scraped data from the website html response.
- And finally, data was saved to a `csv` file.

# Future Work  
The same structure used in this notebook could be used for different tasks :
* Instead of choosing `large-airports` only, we could specify a country, continent or a certain category of airports.
* Our script can generate an email (once a day ?) including weather data of interest.
* Using libraries like `Streamlit`, `Jango` or `Flask` we can design a user-friendly interface to make this project easier to interact with.

In [None]:
jovian.submit(assignment="zerotoanalyst-project1")