In [1]:
import pandas as pd

In [2]:
air_quality_no2 = pd.read_csv("data/air_quality_no2_long.csv", parse_dates=True)

In [3]:
air_quality_no2 = air_quality_no2[
    ["date.utc", "location", "parameter", "value"]]

In [4]:
air_quality_no2.head()

Unnamed: 0,date.utc,location,parameter,value
0,2019-06-21 00:00:00+00:00,FR04014,no2,20.0
1,2019-06-20 23:00:00+00:00,FR04014,no2,21.8
2,2019-06-20 22:00:00+00:00,FR04014,no2,26.5
3,2019-06-20 21:00:00+00:00,FR04014,no2,24.9
4,2019-06-20 20:00:00+00:00,FR04014,no2,21.4


In [5]:
air_quality_pm25 = pd.read_csv("data/air_quality_pm25_long.csv", parse_dates=True)

In [6]:
air_quality_pm25 = air_quality_pm25[
    ["date.utc", "location", "parameter", "value"]]

In [7]:
air_quality_pm25.head()

Unnamed: 0,date.utc,location,parameter,value
0,2019-06-18 06:00:00+00:00,BETR801,pm25,18.0
1,2019-06-17 08:00:00+00:00,BETR801,pm25,6.5
2,2019-06-17 07:00:00+00:00,BETR801,pm25,18.5
3,2019-06-17 06:00:00+00:00,BETR801,pm25,16.0
4,2019-06-17 05:00:00+00:00,BETR801,pm25,7.5


The `concat()` function performs concatenation operations of multiple tables along one of the axis (row-wise or column-wise). A `DataFrame` has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). By default concatenation is along axis 0, so the resulting table combines the rows of the input tables.

In [8]:
air_quality = pd.concat(
    [air_quality_pm25, air_quality_no2], axis=0)

In [9]:
air_quality_pm25.shape

(1110, 4)

In [10]:
air_quality_no2.shape

(2068, 4)

In [11]:
air_quality.shape

(3178, 4)

The stations used in this example (FR04014, BETR801 and London Westminster) are just three entries enlisted in the metadata table.

In [12]:
stations_coord = pd.read_csv("data/air_quality_stations.csv")

In [13]:
stations_coord.head()

Unnamed: 0,location,coordinates.latitude,coordinates.longitude
0,BELAL01,51.23619,4.38522
1,BELHB23,51.1703,4.341
2,BELLD01,51.10998,5.00486
3,BELLD02,51.12038,5.02155
4,BELR833,51.32766,4.36226


In [14]:
air_quality.head()

Unnamed: 0,date.utc,location,parameter,value
0,2019-06-18 06:00:00+00:00,BETR801,pm25,18.0
1,2019-06-17 08:00:00+00:00,BETR801,pm25,6.5
2,2019-06-17 07:00:00+00:00,BETR801,pm25,18.5
3,2019-06-17 06:00:00+00:00,BETR801,pm25,16.0
4,2019-06-17 05:00:00+00:00,BETR801,pm25,7.5


We only want to add the coordinates of these three to the measurements table, each on the corresponding rows of the `air_quality` table. Using the `merge()` function, for each of the rows in the `air_quality` table, the corresponding coordinates are added from the `air_quality_stations_coord` table. Both tables have the column `location` in common which is used as a key to combine the information. By choosing the `left` join, only the locations available in the `air_quality` (left) table, i.e. FR04014, BETR801 and London Westminster, end up in the resulting table. The `merge` function supports multiple join options similar to database-style operations.

In [15]:
air_quality = pd.merge(air_quality, stations_coord, how="left", on="location")

In [16]:
air_quality.head()

Unnamed: 0,date.utc,location,parameter,value,coordinates.latitude,coordinates.longitude
0,2019-06-18 06:00:00+00:00,BETR801,pm25,18.0,51.20966,4.43182
1,2019-06-17 08:00:00+00:00,BETR801,pm25,6.5,51.20966,4.43182
2,2019-06-17 07:00:00+00:00,BETR801,pm25,18.5,51.20966,4.43182
3,2019-06-17 06:00:00+00:00,BETR801,pm25,16.0,51.20966,4.43182
4,2019-06-17 05:00:00+00:00,BETR801,pm25,7.5,51.20966,4.43182
