In [2]:
import pandas as pd
import json
import folium
import os
import requests
from bs4 import BeautifulSoup

## European unemployment

We obtained the data from [this eurostat page](http://ec.europa.eu/eurostat/data/database) navigating into the following directories hierarchy:

 - Tables by theme
 - Population and social conditions
 - Labour market
 - Employment and unemployment (Labour force survey)
 - LFS Main indicators
 - Unemployment - LFS adjusted series
 - Total unemployment rate

[Statistical description of the dataset](http://ec.europa.eu/eurostat/cache/metadata/en/tsdec450_esmsip.htm)

We check the data and remark that we have to split the first column.

In [3]:
euro = pd.read_csv('Homework3-data/european_unemployment.tsv',
                   sep='\t',
                   usecols=['age,unit,sex,geo\\time', '2016 '])
euro.head()

Unnamed: 0,"age,unit,sex,geo\time",2016
0,"TOTAL,PC_ACT,T,AT",6.0
1,"TOTAL,PC_ACT,T,BE",7.8
2,"TOTAL,PC_ACT,T,BG",7.6
3,"TOTAL,PC_ACT,T,CY",13.0
4,"TOTAL,PC_ACT,T,CZ",4.0


In [4]:
countries = pd.DataFrame(euro['age,unit,sex,geo\\time'].str.split(',', expand=True))
countries.head()

Unnamed: 0,0,1,2,3
0,TOTAL,PC_ACT,T,AT
1,TOTAL,PC_ACT,T,BE
2,TOTAL,PC_ACT,T,BG
3,TOTAL,PC_ACT,T,CY
4,TOTAL,PC_ACT,T,CZ


In [5]:
countries[0].unique()

array(['TOTAL'], dtype=object)

In [6]:
countries[1].unique()

array(['PC_ACT', 'PC_POP', 'THS_PER'], dtype=object)

As indicated [here](https://www.google.ch/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwiFm4i2-pzXAhUGZlAKHRd_BmQQFggmMAA&url=http%3A%2F%2Fec.europa.eu%2Feurostat%2Fdocuments%2F10186%2F6246844%2FChanges%2Bto%2Bthe%2Bunemployment%2Bdata%2Btables.pdf&usg=AOvVaw10j2p5RgC8vFvpEuvBX2Jn) we only keep `PC_ACT`

In [7]:
countries = countries[countries[1] == 'PC_ACT'][[3]]
countries.columns = ['country']
countries.head()

Unnamed: 0,country
0,AT
1,BE
2,BG
3,CY
4,CZ


In [8]:
euro = countries.merge(euro, left_index=True, right_index=True)\
                .drop('age,unit,sex,geo\\time', axis=1)
euro.columns = ['country', '2016']

In [9]:
euro.head()

Unnamed: 0,country,2016
0,AT,6.0
1,BE,7.8
2,BG,7.6
3,CY,13.0
4,CZ,4.0


In [10]:
euro['2016'].describe()

count    37.000000
mean      8.286486
std       4.201201
min       3.000000
25%       5.900000
50%       7.800000
75%      10.000000
max      23.600000
Name: 2016, dtype: float64

In [11]:
KIEV = [50.450, 30.5234]

In [12]:
euromap = folium.Map(location=KIEV, zoom_start=4)
euromap

In [13]:
geo_data_bis = json.load(open(r'Homework3-data/topojson/europe.topojson.json'))

In [14]:
map_id, map_name = zip(*(map(lambda obj: (obj['id'], obj['properties']['NAME']),
                        geo_data_bis['objects']['europe']['geometries'])))

In [15]:
map_data = pd.DataFrame({'id': map_id, 'name': map_name})
map_data.head()

Unnamed: 0,id,name
0,AZ,Azerbaijan
1,AL,Albania
2,AM,Armenia
3,BA,Bosnia and Herzegovina
4,BG,Bulgaria


In [16]:
euro = euro.merge(map_data, left_on='country', right_on='id', how='right').drop('id', axis=1)

In [17]:
euro = euro[['country', 'name', '2016']]
euro.head()

Unnamed: 0,country,name,2016
0,AT,Austria,6.0
1,BE,Belgium,7.8
2,BG,Bulgaria,7.6
3,CY,Cyprus,13.0
4,CZ,Czech Republic,4.0


In [18]:
euromap.choropleth(
    geo_data=geo_data_bis,
    data=euro,
    topojson="objects.europe",
    columns=['country', '2016'],
    key_on='feature.id',
    fill_color='BuPu', fill_opacity=0.7, line_opacity=0.2,
    legend_name='Unemployment rate'
)

In [19]:
euromap

# Switzerland Unemployment

We obtained the data from the amstat.ch website.
We are looking for a dataset that contains the swiss unemployment rates at a recent date.
The website provides datasets that can include various employment indicators.
We just go through the following directories :
- Details
- Chomeurs et demandeurs d'emplois
- Taux de chomage
We then select the following categories which we will use to create our maps :
- month : Mois sous revue (September 2017)
- Taux de chomage
- Chomeurs inscrits
- Demandeurs d'emplois
- by canton

Once the dataset file is created we open it to do some cleaning on the data

In [20]:
globalUn = pd.read_csv('Homework3-data/totalTaux.csv', delimiter=',', encoding = "ISO-8859-1", skiprows=[1,28])
globalUn.head()

Unnamed: 0,"Canton,""Mois"",""Septembre 2017"",""Septembre 2017"",""Septembre 2017"",""Total"",""Total"",""Total"""
0,"Zurich,,""3.3"",""27'225"",""34'156"",""3.3"",""27'225""..."
1,"Berne,,""2.4"",""13'658"",""18'385"",""2.4"",""13'658"",..."
2,"Lucerne,,""1.7"",""3'885"",""6'756"",""1.7"",""3'885"",""..."
3,"Uri,,""0.6"",""112"",""257"",""0.6"",""112"",""257"""
4,"Schwyz,,""1.7"",""1'455"",""2'229"",""1.7"",""1'455"",""2..."


We know create the columns by splitting the unique initial column and name them

In [21]:
Cantons = pd.DataFrame(globalUn['Canton,"Mois","Septembre 2017","Septembre 2017","Septembre 2017","Total","Total","Total"'].str.split(',', expand=True))
Cantons.drop(Cantons.columns[[1,5,6,7]], axis=1, inplace=True)
Cantons.columns = ['canton', 'unemployment rate', 'Nb unemployed', 'searching job']
Cantons = Cantons.replace({'"':''}, regex=True)
Cantons = Cantons.replace({'\'':''}, regex=True)
Cantons.head()

Unnamed: 0,canton,unemployment rate,Nb unemployed,searching job
0,Zurich,3.3,27225,34156
1,Berne,2.4,13658,18385
2,Lucerne,1.7,3885,6756
3,Uri,0.6,112,257
4,Schwyz,1.7,1455,2229


To be able to match our dataframe data with the TopoJson file we will add a column to our dataframe containg the canton ids.
Thos ids can be found in wikipedia

In [22]:
# Download Canton codes and french names from wikipedia to match with topojson
page = requests.get('https://fr.wikipedia.org/wiki/Canton_(Suisse)')
soup = BeautifulSoup(page.text, 'html.parser')
table_rows = soup.find('table').findAll('tr')
canton, canton_id = zip(*[(row.find('td').find('a').text, row.find('th').text)
                          for row in table_rows[1:-1]])

In [23]:
cantons_ids = pd.DataFrame({'canton': canton, 'id': canton_id})

In [24]:
Cantons = Cantons.merge(cantons_ids, left_on='canton', right_on='canton')
Cantons = Cantons[['id', 'canton', 'unemployment rate', 'Nb unemployed', 'searching job']]

To be able to choropleth our data in our following maps we convert the string types of our rates and counts to numeric types.
We finally add an ultimate modification to our dataframe adding a searching rate representing the rate of people looking for a job within the active population.

Note : Acording to the definitions provided by the amstat website, the searching job category (demandeurs d'emplois) is the strict union of the unemployment people category (chomeurs inscrits) with the umployed people looking for a job (demandeurs d'emplois non chomeurs).

In [25]:
Cantons[['unemployment rate', 'Nb unemployed', 'searching job']] = Cantons[['unemployment rate', 'Nb unemployed', 'searching job']].convert_objects(convert_numeric=True)
Cantons['searching rate'] = Cantons['searching job']/(Cantons['Nb unemployed']/Cantons['unemployment rate'])
Cantons

  """Entry point for launching an IPython kernel.


Unnamed: 0,id,canton,unemployment rate,Nb unemployed,searching job,searching rate
0,ZH,Zurich,3.3,27225,34156,4.140121
1,BE,Berne,2.4,13658,18385,3.230634
2,LU,Lucerne,1.7,3885,6756,2.956293
3,UR,Uri,0.6,112,257,1.376786
4,OW,Obwald,0.7,153,319,1.459477
5,NW,Nidwald,1.0,248,436,1.758065
6,GL,Glaris,1.8,416,713,3.085096
7,ZG,Zoug,2.3,1543,2615,3.897926
8,FR,Fribourg,2.7,4466,7837,4.737998
9,SO,Soleure,2.6,3801,6628,4.533754


In [26]:
Cantons.dtypes

id                    object
canton                object
unemployment rate    float64
Nb unemployed          int64
searching job          int64
searching rate       float64
dtype: object

We build a blank map of Switzerland.
We then overlay the canton boundaries on this map

In [29]:
geo_data_swiss = json.load(open(r'Homework3-data/topojson/ch-cantons.topojson.json'))

def create_swissmap_with_cantons(line_color="blue", line_width=2):
    swissmap = folium.Map(location=[46.8,8.33], zoom_start=8)
    folium.TopoJson(
        geo_data_swiss,
        'objects.cantons',
        name='topojsonBis',
        style_function=lambda feature:{
            'color'  : line_color,
            'weigth' : line_width
        }
    ).add_to(swissmap)
    return swissmap

In [30]:
swissmap = create_swissmap_with_cantons()

In [31]:
folium.TopoJson(
    geo_data_swiss,
    'objects.cantons',
    name='topojsonBis'
).add_to(swissmap)

<folium.features.TopoJson at 0x7f8865b1ab00>

In [32]:
swissmap

The first rate we will map is the one representing the rate of people looking for a job. This categorie contains both the people who already have a job and those who do not.

In [33]:
def create_swissmap_with_choropleth(df, columns, legend_name, fill_color='BuPu', line_color='blue', line_width=2):
    swissmap = create_swissmap_with_cantons(line_color, line_width)
    swissmap.choropleth(
        geo_data=geo_data_swiss,
        data=df,
        topojson="objects.cantons",
        columns=columns,
        key_on='feature.id',
        fill_color=fill_color, fill_opacity=0.7, line_opacity=0.2,
        legend_name=legend_name
    )
    return swissmap

In [34]:
create_swissmap_with_choropleth(Cantons, ['id', 'searching rate'], 'Searching rate')

We see that the cantons of Geneve, Vaud, Neuchatel and Jura are the ones with the higher rates : over 5.9


We know want to generate the map representing the unemployment rate

In [39]:
create_swissmap_with_choropleth(Cantons, ['id', 'unemployment rate'], 'Unemployment rate')

Here the higher rates are found in the cantons of Geneva and Neuchatel.

We now want to explicitely show the difference/rate between people who have and those who do not have a job within all those looking for one.

In [40]:
Cantons['job differential'] = Cantons['Nb unemployed']/Cantons['searching job']*100
Cantons.head()

Unnamed: 0,id,canton,unemployment rate,Nb unemployed,searching job,searching rate,job differential
0,ZH,Zurich,3.3,27225,34156,4.140121,79.707811
1,BE,Berne,2.4,13658,18385,3.230634,74.288822
2,LU,Lucerne,1.7,3885,6756,2.956293,57.50444
3,UR,Uri,0.6,112,257,1.376786,43.579767
4,OW,Obwald,0.7,153,319,1.459477,47.962382


In [44]:
create_swissmap_with_choropleth(Cantons, ['id', 'job differential'], 'Diffential looking with/without job')

In [376]:
Cantons['job differential'].describe()

count    24.000000
mean     62.076880
std       9.988869
min      43.579767
25%      56.959752
50%      59.981463
75%      69.834041
max      79.707811
Name: job differential, dtype: float64

What we can extract from those results is that most people who are looking for a job do not have one : a mean of 62%.
This is in particular the case in the cantons where the unemployment rates are the higher : Geneva, Vaud, Neuchatel and Jura.