# 00. Table of contents

Chosen map to build: displaying daily mean of cannabis consumption in Europe in year 2021 in a choropleth map

1. Import data and libraries
2. Data wrangling 
3. Data Cleaning 
4. Plotting a choropleth map

# 01. Importing data and libraries

In [11]:
#importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import os
import folium
import json

In [12]:
# This command propts matplotlib visuals to appear in the notebook 

%matplotlib inline

In [13]:
#importing data
path = r'C:\Users\viki\Documents\Data Analytics\Immersion\Achievement 6\02_Data'
df= pd.read_csv(os.path.join(path, 'Prepared_Data', 'merged_cleaned_dataset.csv'))

In [14]:
#importing geojson file of the countries
country_geo = r'C:\Users\viki\Documents\Data Analytics\Immersion\Achievement 6\02_Data\Original_Data\europe.geo.json'

In [15]:
#checking content of the geo data
f = open(r'C:\Users\viki\Documents\Data Analytics\Immersion\Achievement 6\02_Data\Original_Data\europe.geo.json',)
  
# returning JSON object as a dictionary
data = json.load(f)
  
# Iterating through the json list
for i in data['features']:
    print(i)

{'type': 'Feature', 'properties': {'scalerank': 1, 'featurecla': 'Admin-0 country', 'labelrank': 4, 'sovereignt': 'Belarus', 'sov_a3': 'BLR', 'adm0_dif': 0, 'level': 2, 'type': 'Sovereign country', 'admin': 'Belarus', 'adm0_a3': 'BLR', 'geou_dif': 0, 'geounit': 'Belarus', 'gu_a3': 'BLR', 'su_dif': 0, 'subunit': 'Belarus', 'su_a3': 'BLR', 'brk_diff': 0, 'name': 'Belarus', 'name_long': 'Belarus', 'brk_a3': 'BLR', 'brk_name': 'Belarus', 'brk_group': None, 'abbrev': 'Bela.', 'postal': 'BY', 'formal_en': 'Republic of Belarus', 'formal_fr': None, 'note_adm0': None, 'note_brk': None, 'name_sort': 'Belarus', 'name_alt': None, 'mapcolor7': 1, 'mapcolor8': 1, 'mapcolor9': 5, 'mapcolor13': 11, 'pop_est': 9648533, 'gdp_md_est': 114100, 'pop_year': -99, 'lastcensus': 2009, 'gdp_year': -99, 'economy': '6. Developing region', 'income_grp': '3. Upper middle income', 'wikipedia': -99, 'fips_10': None, 'iso_a2': 'BY', 'iso_a3': 'BLR', 'iso_n3': '112', 'un_a3': '112', 'wb_a2': 'BY', 'wb_a3': 'BLR', 'woe_

# 02. Data Wrangling

In [16]:
df.head()

Unnamed: 0,year,metabolite,SiteID,Country,City,Wednesday,Thursday,Friday,Saturday,Sunday,Monday,Tuesday,Weekday mean,Weekend mean,Daily mean,latitude,longitude,population
0,2021,amphetamine,AT001,AT,Graz,47.15,37.48,37.95,38.02,38.14,35.82,35.45,40.03,37.48,38.57,47.070713,15.439504,487040.0
1,2021,cannabis,AT001,AT,Graz,54.77,80.42,60.1,53.88,50.46,77.04,233.51,122.9,60.37,87.17,47.070713,15.439504,487040.0
2,2021,cocaine,AT001,AT,Graz,127.6,121.48,137.75,174.94,179.55,117.31,111.91,120.33,152.39,138.65,47.070713,15.439504,487040.0
3,2021,MDMA,AT001,AT,Graz,5.82,4.53,5.27,16.56,17.37,10.14,7.83,6.06,12.33,9.64,47.070713,15.439504,487040.0
4,2021,methamphetamine,AT001,AT,Graz,12.44,11.24,14.99,9.33,18.66,9.33,15.66,13.11,13.08,13.09,47.070713,15.439504,487040.0


In [22]:
#creating a subset of cannabis consume in 2021
sub=df[(df['metabolite']=='cannabis') & (df['year']== 2021) ]

In [23]:
sub.head()

Unnamed: 0,year,metabolite,SiteID,Country,City,Wednesday,Thursday,Friday,Saturday,Sunday,Monday,Tuesday,Weekday mean,Weekend mean,Daily mean,latitude,longitude,population
1,2021,cannabis,AT001,AT,Graz,54.77,80.42,60.1,53.88,50.46,77.04,233.51,122.9,60.37,87.17,47.070713,15.439504,487040.0
16,2021,cannabis,AT002,AT,Hall-Wattens,44.93,19.69,34.03,38.92,25.62,40.21,50.06,38.23,34.7,36.21,47.29168,11.59284,78180.0
36,2021,cannabis,AT004,AT,Innsbruck,87.9,81.77,117.8,102.25,93.93,88.05,112.37,94.01,100.51,97.72,47.269212,11.404102,251656.0
64,2021,cannabis,AT005,AT,Kapfenberg,73.92,101.08,81.72,82.54,15.45,121.91,59.07,78.02,75.41,76.53,47.443562,15.2901,31032.0
79,2021,cannabis,AT007,AT,Kufstein,78.22,78.77,100.19,72.26,115.0,94.57,99.68,85.55,95.5,91.24,47.582958,12.17077,41688.0


In [28]:
sub.shape

(45, 18)

In [24]:
#creating subset of the columns we want to plot
columns= ['Country', 'Daily mean']
plot= sub[columns]

In [26]:
#checking new df we want to plot
plot.shape

(45, 2)

In [27]:
plot.head()

Unnamed: 0,Country,Daily mean
1,AT,87.17
16,AT,36.21
36,AT,97.72
64,AT,76.53
79,AT,91.24


# 03. Data cleaning

In [30]:
#data cleaning have been conducted but checking if we have any missing values in the subset 
# as we know there were some measurement values missing initally
plot.isnull().sum() #no missing values

Country       0
Daily mean    0
dtype: int64

In [34]:
plot.Country.value_counts()

AT    9
SI    6
CZ    5
ES    4
NL    3
PT    3
SE    3
IT    2
SK    2
TR    2
EE    2
FR    1
GR    1
HR    1
PL    1
Name: Country, dtype: int64

# 04. Plotting choropleth map

In [47]:
# setting up a folium map
map = folium.Map(location = [50,10], zoom_start = 4)

# building Choropleth map for cannabis usage in 2021 in Europe
folium.Choropleth(
    geo_data = country_geo, 
    data = sub,
    columns = ['Country', 'Daily mean'],
    key_on = 'feature.properties.iso_a2', 
    fill_color = 'YlOrBr', fill_opacity=0.6, line_opacity=0.1,
    legend_name = "Daily mean of cannabis consume in 2021").add_to(map)
folium.LayerControl().add_to(map)

map

On the map we can see that in France and Croatia was the highest average cannabis consumption in 2021 in Europe from all the countries we have data available. 
For the countries colored in black, we have no measurement data available.

As we have data for different metabolite types and different years, a choropleth map with some built in filters of these two would make sense to build for the final dashboard. That way we have a interactive map where the user can create a custom choropleth map by filtering metabolite type and year.