# Análise de dados do Ar

## Fonte dos dados
[Air Quality Open Data Platform](https://aqicn.org/data-platform/covid19/)

The data for each major cities is based on the average (median) of several stations. The data set provides min, max, median and standard deviation for each of the air pollutant species (PM2.5,PM10, Ozone ...) as well as meteorological data (Wind, Temperature, ...). All air pollutant species are converted to the US EPA standard (i.e. no raw concentrations). All dates are UTC based. The count column is the number of samples used for calculating the median and standard deviation.

In [1]:
# import necessary libraries
import pandas as pd
import os
import glob
import json

# use glob to get all the csv files
# in the folder
path = os.path.abspath('../data/air-quality')
csv_files = glob.glob(os.path.join(path, "*.csv"))

# loop over the list of csv files
df = pd.DataFrame()
for f in csv_files:
    df_tmp = pd.read_csv(f,comment='#')
    df = pd.concat([df, df_tmp])

Buscando informações sobre as cidades

In [3]:
cities_data = []
with open('../data/air-quality/airquality-covid19-cities.json', encoding="utf-8") as json_file:
    cities_data = json.load(json_file)['data']

In [4]:
print(len(df), "Rows")
print(len(cities_data), "Cities")

10124378 Rows
618 Cities


In [5]:
df.head(3)

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
0,2015-01-06,KR,Jeonju,co,124,0.1,12.3,4.5,55.74
1,2015-01-22,KR,Jeonju,co,116,4.5,10.0,6.7,16.09
2,2015-03-30,KR,Jeonju,co,118,1.2,11.2,5.6,35.98


In [6]:
df.describe(include="all")

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
count,10124378,10124378,10124378,10124378,10124380.0,10124380.0,10124380.0,10124380.0,10124380.0
unique,2106,95,616,24,,,,,
top,2020-01-05,CN,London,pm25,,,,,
freq,14200,1049342,36307,992557,,,,,
mean,,,,,127.7638,89.98309,123.5257,102.9449,6404.836
std,,,,,180.9249,271.5001,275.0709,270.4365,206343.4
min,,,,,2.0,-3276.6,-3065.6,-3065.6,0.0
25%,,,,,44.0,1.0,10.6,4.2,22.25
50%,,,,,72.0,4.5,28.5,14.0,142.93
75%,,,,,144.0,20.0,78.9,39.6,882.67


In [7]:
df.dtypes

Date         object
Country      object
City         object
Specie       object
count         int64
min         float64
max         float64
median      float64
variance    float64
dtype: object

In [8]:
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

In [9]:
species_list = df['Specie'].unique()
print(species_list)

['co' 'pm10' 'o3' 'so2' 'no2' 'pm25' 'psi' 'uvi' 'neph' 'aqi' 'mepaqi'
 'pol' 'temperature' 'humidity' 'pressure' 'wd' 'wind-speed' 'd' 'pm1'
 'wind-gust' 'precipitation' 'dew' 'wind speed' 'wind gust']


## Agregando países e geolocalização

In [23]:
df.query('Country == "BR"')

Unnamed: 0,Date,Country,City,Specie,count,min,max,median,variance
293819,2015-01-13,BR,São José dos Campos,pm10,15,1.0,55.0,19.0,3601.43
293820,2015-04-08,BR,São José dos Campos,pm10,23,6.0,23.0,12.0,134.47
293821,2015-04-21,BR,São José dos Campos,pm10,23,1.0,47.0,18.0,1838.77
293822,2015-06-07,BR,São José dos Campos,pm10,23,10.0,28.0,18.0,305.73
293823,2015-06-10,BR,São José dos Campos,pm10,23,11.0,78.0,30.0,3854.74
...,...,...,...,...,...,...,...,...,...
54641,2022-05-17,BR,São Paulo,co,229,1.0,11.8,3.7,39.22
54642,2022-06-25,BR,São Paulo,co,252,1.0,36.2,10.9,686.30
54643,2022-02-22,BR,São Paulo,co,311,1.0,17.2,4.6,77.86
54644,2022-04-14,BR,São Paulo,co,203,1.0,16.3,2.8,144.20


In [122]:
def get_lat_lng(city):
    for x in cities_data:
        if x['Place']['name'] == str(city):
            lat,lng = x['Place']['geo']
            return lat,lng

print(get_lat_lng('Jeonju'))

x = df.head(1)
get_lat_lng(x['City'])
x['lat'],x['lng']=get_lat_lng(x['City'])

(35.82194, 127.14889)


TypeError: cannot unpack non-iterable NoneType object