# Análise de dados do Ar

## Fonte dos dados
[Air Quality Open Data Platform](https://aqicn.org/data-platform/covid19/)

The data for each major cities is based on the average (median) of several stations. The data set provides min, max, median and standard deviation for each of the air pollutant species (PM2.5,PM10, Ozone ...) as well as meteorological data (Wind, Temperature, ...). All air pollutant species are converted to the US EPA standard (i.e. no raw concentrations). All dates are UTC based. The count column is the number of samples used for calculating the median and standard deviation.

In [1]:
# import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import os
import glob
import json
pd.set_option('display.max_columns', None)

df = pd.read_csv('../data/air-quality/results/iqar-countries.csv')
df_index = pd.read_csv('../data/air-quality/classification-iqar.csv')

In [2]:
df_index

Unnamed: 0,INDEX,QUALITY,pm25,pm10,so2,no2,co,o3
0,0,BOA,25,50,20,200,9.0,100
1,1,REGULAR,60,120,125,260,9.05,140
2,2,INADEQUADA,124,249,799,1129,14.9,199
3,3,RUIM,209,419,1599,2259,29.9,399
4,4,PESSIMO,249,499,2099,2999,39.9,599


In [3]:
df

Unnamed: 0,DATE,NAME,ALPHA-2,ALPHA-3,REGION,SPECIE,MEAN_BY_COUNTRY
0,2014,Austria,AT,AUT,Europe,co,0.100000
1,2014,Austria,AT,AUT,Europe,no2,17.886667
2,2014,Austria,AT,AUT,Europe,pm10,24.200000
3,2014,Austria,AT,AUT,Europe,pm25,83.000000
4,2014,Austria,AT,AUT,Europe,so2,2.100000
...,...,...,...,...,...,...,...
3146,2020,Ghana,GH,GHA,Africa,pm25,65.369637
3147,2021,Ghana,GH,GHA,Africa,pm25,78.184066
3148,2022,Ghana,GH,GHA,Africa,pm25,75.008333
3149,2020,Guinea,GN,GIN,Africa,pm25,57.564014


In [4]:
df_index[df_index['no2'] > 2.633333].head(1)
df_index[df_index['o3'] >= 700].head(1)["INDEX"]

Series([], Name: INDEX, dtype: int64)

In [5]:
def search_quality(row):
    #print(row)
    specie = row["SPECIE"]
    mean = row["MEAN_BY_COUNTRY"]
    filtered = df_index[df_index[specie] >= mean]
    if(filtered.empty): return 5
    return filtered.iloc[0]['INDEX']

df["CLASSIFICATION"] = df.apply(search_quality, axis=1)
df.head(10)

Unnamed: 0,DATE,NAME,ALPHA-2,ALPHA-3,REGION,SPECIE,MEAN_BY_COUNTRY,CLASSIFICATION
0,2014,Austria,AT,AUT,Europe,co,0.1,0
1,2014,Austria,AT,AUT,Europe,no2,17.886667,0
2,2014,Austria,AT,AUT,Europe,pm10,24.2,0
3,2014,Austria,AT,AUT,Europe,pm25,83.0,2
4,2014,Austria,AT,AUT,Europe,so2,2.1,0
5,2015,Austria,AT,AUT,Europe,co,0.1,0
6,2015,Austria,AT,AUT,Europe,no2,14.367834,0
7,2015,Austria,AT,AUT,Europe,pm10,19.525478,0
8,2015,Austria,AT,AUT,Europe,pm25,52.685333,1
9,2015,Austria,AT,AUT,Europe,so2,1.631425,0


In [6]:
df[df["CLASSIFICATION"] != 0]

Unnamed: 0,DATE,NAME,ALPHA-2,ALPHA-3,REGION,SPECIE,MEAN_BY_COUNTRY,CLASSIFICATION
3,2014,Austria,AT,AUT,Europe,pm25,83.000000,2
8,2015,Austria,AT,AUT,Europe,pm25,52.685333,1
13,2016,Austria,AT,AUT,Europe,pm25,45.827586,1
18,2017,Austria,AT,AUT,Europe,pm25,53.013661,1
23,2018,Austria,AT,AUT,Europe,pm25,54.172131,1
...,...,...,...,...,...,...,...,...
3146,2020,Ghana,GH,GHA,Africa,pm25,65.369637,2
3147,2021,Ghana,GH,GHA,Africa,pm25,78.184066,2
3148,2022,Ghana,GH,GHA,Africa,pm25,75.008333,2
3149,2020,Guinea,GN,GIN,Africa,pm25,57.564014,1


In [7]:
df = df.groupby(['DATE', 'ALPHA-3', 'ALPHA-2']).mean().reset_index()

In [8]:
df.columns = ['DATE','CODE', 'CODE-2', 'MEAN','CLASSIFICATION']

In [9]:
df.to_csv('../data/air-quality/results/iqar-countries-class-after.csv',index=False)