### Ex.2


The file limits_IT_province.geojson includes the contour of all the Italian administrative districts called "province"

The file polveri.csv includes the data about pollution captured by different sensors in the province of Veneto (the region including Verona) in different years (number of days in which the value of fine dust exceeded the limit)

Task:
- compute the average values measured by the different sensors for each “provincia” in 2022 and 2012
- create a choropleth map with each “provincia” is represented with a categorical color
- add a symbol map (scattered_geo) with dot size representing the average number of days over limits in 2022 and color representing the increase/decrease with respect to 2012 (think of an optimal colormap to highlight improvements/deteriorations)

In [61]:
import pandas as pd
import plotly.express as px
import numpy as np

pm25 = pd.read_csv("polveri.csv")

pm25

Unnamed: 0,Provincia,Comune,Stazionediriferimento,CodiceStazione,Tipologiastazione,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
0,Belluno,Belluno,BL_ParcoCittàdiBologna,IT1594A,FU,22.0,19.0,17.0,16.0,16.0,14.0,15.0,13.0,15.0,14.0,13.0,13.0,13.0,14.0
1,Belluno,Feltre,AreaFeltrina,IT1619A,FU/FS,27.0,24.0,25.0,23.0,22.0,18.0,21.0,20.0,21.0,18.0,18.0,19.0,16.0,16.0
2,Padova,Padova,PD_Mandria,IT1453A,FU,32.0,31.0,34.0,32.0,28.0,24.0,31.0,30.0,34.0,27.0,24.0,25.0,21.0,23.0
3,Padova,Padova,PD_aps1,99902,IU,32.0,33.0,37.0,29.0,27.0,23.0,28.0,25.0,29.0,26.0,26.0,28.0,24.0,25.0
4,Padova,Padova,PD_aps2,99903,IU,29.0,26.0,29.0,28.0,26.0,22.0,28.0,24.0,26.0,24.0,24.0,25.0,22.0,24.0
5,Padova,Este,Este,IT1871A,IS,,,,,,18.0,23.0,20.0,22.0,19.0,19.0,20.0,15.0,17.0
6,Padova,Monselice,Monselice,99910,IU/FU,24.0,21.0,26.0,,,,,,22.0,19.0,19.0,21.0,17.0,17.0
7,Rovigo,PortoTolle,PortoTolle,IT1212A,FS,,21.0,22.0,19.0,,,,,,,,,,
8,Rovigo,Rovigo,RO_LargoMartiri,IT1215A,TU,,,31.0,29.0,25.0,21.0,28.0,24.0,28.0,25.0,24.0,23.0,20.0,23.0
9,Rovigo,PortoViro,GNLPortoLevante,99907,IS,,,24.0,18.0,15.0,13.0,18.0,16.0,18.0,,,,,


### Compute the average values measured by the different sensors for each “provincia” in 2022 and 2012


In [None]:
# Compute the average values for each provincia in 2022
avg_2022 = pm25.groupby('Provincia')['2022'].mean().reset_index()
avg_2022.columns = ['Provincia', 'Avg_2022']

# Compute the average values for each provincia in 2012
avg_2012 = pm25.groupby('Provincia')['2012'].mean().reset_index()
avg_2012.columns = ['Provincia', 'Avg_2012']

# Merge the two dataframes
avg_values = pd.merge(avg_2022, avg_2012, on='Provincia')

avg_values

Unnamed: 0,Provincia,Avg_2022,Avg_2012
0,Belluno,15.0,19.5
1,Padova,21.2,29.666667
2,Rovigo,23.0,22.0
3,Treviso,18.0,26.0
4,Venezia,21.333333,30.0
5,Verona,18.0,24.0
6,Vicenza,19.75,24.5


### create a choropleth map with each “provincia” is represented with a categorical color

In [None]:
import json



In [75]:
province = "limits_IT_provinces.geojson"

fig = px.choropleth(data_frame=avg_values,
                    geojson=province, 
                    locations='Provincia',
                    featureidkey="properties.prov_name",
                    color='Provincia',
                    scope="europe",
                    )

fig.update_geos(showcountries=False, showcoastlines=False, showland=False, fitbounds="locations")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

what's wrong in the previous visualization?

It is not possible to see the difference between the provinces as the color is just distinguishing the provinces, not the values. Instead, we should use a continuous color scale to represent the average values measured by the different sensors.

### add a symbol map (scattered_geo) with dot size representing the average number of days over limits in 2022 and color representing the increase/decrease with respect to 2012 (think of an optimal colormap to highlight improvements/deteriorations)

In [32]:
# Calculate the difference between 2022 and 2012
avg_values['Difference'] = avg_values['Avg_2022'] - avg_values['Avg_2012']
avg_values

Unnamed: 0,Provincia,Avg_2022,Avg_2012,Difference
0,Belluno,15.0,19.5,-4.5
1,Padova,21.2,29.666667,-8.466667
2,Rovigo,23.0,22.0,1.0
3,Treviso,18.0,26.0,-8.0
4,Venezia,21.333333,30.0,-8.666667
5,Verona,18.0,24.0,-6.0
6,Vicenza,19.75,24.5,-4.75


In [89]:
fig_choropleth = px.choropleth(data_frame=avg_values,
                               geojson=province,
                               locations='Provincia',
                               featureidkey="properties.prov_name",
                               color='Difference',
                               color_continuous_scale="Magma",
                               )
fig_choropleth.update_geos(showcountries=False, showcoastlines=False, showland=False, fitbounds="locations")
fig_choropleth.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig_scatter_geo = px.scatter_geo(avg_values,
                                 geojson=province,
                                 locations='Provincia',
                                 featureidkey="properties.prov_name",
                                 size='Avg_2022',
                                 color='Difference',
                                 size_max=12,
                                 projection="mercator"
                                 )

fig_choropleth.add_trace(fig_scatter_geo.data[0])
for i, frame in enumerate(fig_choropleth.frames):
    fig_choropleth.frames[i].data += (fig_scatter_geo.frames[i].data[0],)
fig_choropleth.show()