# Downloading PM2.5 Data from the NAPS Data Mart for Analysis

## Table of Contents:

1. [Download data files](#download_data_files)
2. [Download station data](#download_station_data)
3. [Unzip the data files](#unzip_data_files)

## 1. Download data files <a name="download_data_files"></a>

First, the following code will add a project root to the system path, set up logger, and import required modules and variables. The log files will be created under `logs/` directory.


In [None]:
import sys
from pathlib import Path

project_root = Path.cwd().parents[0]
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

from src.data.download_data import *


The following function will create a default location of the directory, `data/raw`, to store files that will be downloaded. The zip files of the integrated PM$_{2.5}$ data sets will be downloaded from [the NAPS Data Mart](https://data-donnees.ec.gc.ca/data/air/monitor/national-air-pollution-surveillance-naps-program/Data-Donnees/) will be stored into the directories such as `data/raw/2010`, `data/raw/2011` etc. Downloading may take a few minutes depending on your network speed.

<div class="alert alert-block alert-info">
<b>For maintenance:</b> If you fail to download the files, the API or the file locations might be changed. You could modify the URLs on `data/config/data_urls.csv`.
</div>

In [None]:
download_integrated_dataset()


Smilarly, the next function will download continuous PM$_{2.5}$ data and save it in the local environment.

In [None]:
download_continuous_dataset()


## 2. Download station data <a name="download_station_data"></a>

Next, the following function will download the file containing the information about the stations, `StationsNAPS-StationsSNPA.csv`, from the NAPS web site and will store it as `data/raw/stations.csv`.

<div class="alert alert-block alert-info">
<b>For maintenance:</b> If you fail to download this file, the API or the file location might be changed. You could modify the URL in the row which `type` is `stations` in `data/config/info_urls.csv`.
</div>

In [None]:
download_station_data()


Up to this point, the directory tree should be like the following (The file names are inconsistent as they are).

```
{project_root}
  │
  ├── data/raw/continuous_pm25/
  │             ├── PM25_2003.csv
  │             ├── ...
  │             └── PM25_2019.csv
  │
  ├── data/raw/integrated_pm25/
  │             ├── 2003/
  │             │    ├── 2003PAH.zip
  │             │    ├── 2003PMSPECIATION.zip
  │             │    └── 2003VOC.zip
  │             ├── ...
  │             │
  │             ├── 2010/
  │             │    ├── 2010_IntegratedPM2.5.zip
  │             │    ├── 2010_PAH.zip
  │             │    └── 2010VOC.zip
  │             ├── ...
  │             │
  │             ├── 2014/
  │             │    ├── 2014_CARBONYLS.zip
  │             │    ├── 2014_IntegratedPM2.5.zip
  │             │    ├── 2014_PAH.zip
  │             │    └── 2014_VOC.zip
  │             ├── ...
  │             │
  │             ├── 2016/
  │             │    ├── 2016_CARBONYLS-CARBONYLES.zip
  │             │    ├── 2016_IntegratedPM2.5-PM2.5Ponctuelles.zip
  │             │    ├── 2016_PAH-HAP.zip
  │             │    └── 2016_VOC-COV.zip
  │             ├── ...
  │             │
  │             ├── 2019/
  │             │    └── ...
  │             │
  │             └── stations.csv
  │
  └── notebooks/
            ├──download_PM25_data.ipynb (this notebook)
            └── ...
```

## 3. Unzip the data files <a name="unzip_data_files"></a>

Then, the following function will unzip the integrated data files that we have downloaded. The unzipped files will be stored under  `data/raw/{YEAR}/` directories.

In [None]:
unzip_integrated_dataset()
