# Objectives
Analyse the number of cyclists in Paris

* Quantify the rise of cyclists in Paris
    * Get data from traffic, accidents
    * Bike lane construction
* Correlate the accidents with time of day, condition of the road, gender
* Is the increase in car traffic leading to more bike accidents?
* Is the increase bikes lanes helping in the drecrease of bike accidents?
* Is the increase of bike traffic leading to more bike lanes? And in which areas?

# Data that we might need (to scrape)
* Public investment (bike lane construction, public incentives to buy bikes)
* Car Traffic in Paris
* Number of bycicles sold in Paris (we might have info about sales of eletric bikes in Paris)
* Average Salary in Paris
* Average bike prices

In [1]:
import pandas as pd

In [2]:
# Importing the accidents dataset - Imports with no issues
accidents = pd.read_csv('https://www.data.gouv.fr/en/datasets/r/3d5f2317-5afd-4a9f-a9c5-bd4fe0113f39', low_memory=False)

In [3]:
# Importing the bike lanes dataset
# The CSV file is in the wrong format because it doesn't have a delimiter and it is already separated
# Because of that, the read_csv function is having trouble reading the file
# The error_bad_lines parameter ignores the rows that can't read, so those are not imported
# However even after the import, the dataframe is in the wrong format
#Solved it! It was the separator, which is a ; instead of a comma...
bike_lanes = pd.read_csv('https://www.data.gouv.fr/en/datasets/r/1211e838-4b77-4ee4-9567-03d78d55f0bf', sep=';')
bike_lanes

Unnamed: 0,Typologie,Aménagement bidirectionnel,Régime de vitesse,Sens vélo,Voie,Arrondissement,Bois,Longueur du tronçon en m,Longueur du tronçon en km,Position aménagement,Circulation générale interdite,Piste,Couloir bus,Continuité cyclable,Réseau cyclable,Date de livraison,geo_shape,geo_point_2d
0,Pistes cyclables,Non,Voie 50,Sens de circulation générale,BOULEVARD VINCENT AURIOL,13.0,Non,331.982187,0.331982,Latéral,,Niveau chaussée,,,,2019-09-15,"{""type"": ""LineString"", ""coordinates"": [[2.3670...","48.8354585987,2.36903535246"
1,Couloirs de bus ouverts aux vélos,Oui,Voie 50,Sens de circulation générale,PONT D AUSTERLITZ,13.0,Non,87.569734,0.087570,,,,Protégé,,,2008-12-31,"{""type"": ""LineString"", ""coordinates"": [[2.3651...","48.8446462139,2.36561303618"
2,Pistes cyclables,Non,Voie 50,Sens de circulation générale,AVENUE D ITALIE,13.0,Non,21.056739,0.021057,,,Niveau chaussée,,,,2005-12-31,"{""type"": ""LineString"", ""coordinates"": [[2.3572...","48.8258206727,2.35725633367"
3,Pistes cyclables,Non,Voie 50,Sens de circulation générale,AVENUE D ITALIE,13.0,Non,26.100499,0.026100,,,Niveau chaussée,,,,2005-12-31,"{""type"": ""LineString"", ""coordinates"": [[2.3570...","48.8264179159,2.35707237369"
4,Couloirs de bus ouverts aux vélos,Non,Voie 50,Sens de circulation générale,RUE TRONCHET,8.0,Non,213.354024,0.213354,,,,Marqué,,,,"{""type"": ""LineString"", ""coordinates"": [[2.3252...","48.8718082566,2.32585179311"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11949,Autres itinéraires cyclables (ex : Aires piéto...,Oui,,Sens de circulation générale,Route de Suresnes,16.0,,569.611903,0.569612,,Oui,,,,,2019-07-20,"{""type"": ""LineString"", ""coordinates"": [[2.2723...","48.869946602,2.26905428583"
11950,Autres itinéraires cyclables (ex : Aires piéto...,,Zone 30,,Rue des Frigos,13.0,,97.550409,0.097550,,,,,,,2020-03-31,"{""type"": ""LineString"", ""coordinates"": [[2.3798...","48.8314530219,2.37948830922"
11951,Autres itinéraires cyclables (ex : Aires piéto...,,Zone 30,Contresens,Rue du Chevaleret,13.0,,135.890653,0.135891,,,,,,,2020-03-31,"{""type"": ""LineString"", ""coordinates"": [[2.3765...","48.82908925,2.37611568574"
11952,Autres itinéraires cyclables (ex : Aires piéto...,,Zone 30,Contresens,Rue Cantagrel,13.0,,252.100429,0.252100,,,,,,,2020-03-31,"{""type"": ""LineString"", ""coordinates"": [[2.3776...","48.8267036506,2.37606943859"


In [4]:
# Importing the traffic dataset
# Same problem as the bike lane dataset. Even though it doesn't give an error when reading the file,
# the dataframe is in the wrong format as well.
# Problem solved! It was the same problem with the separator.
traffic = pd.read_csv('https://www.data.gouv.fr/en/datasets/r/237382af-0e7a-4ef8-9508-b3e9e78adcfd', sep=';')

In [5]:
traffic

Unnamed: 0,Identifiant du compteur,Nom du compteur,Identifiant du site de comptage,Nom du site de comptage,Comptage horaire,Date et heure de comptage,Date d'installation du site de comptage,Lien vers photo du site de comptage,Coordonnées géographiques
0,100003096-SC,97 avenue Denfert Rochereau SO-NE,100003096,97 avenue Denfert Rochereau SO-NE,1,2019-08-01T04:00:00+02:00,2012-02-22,https://www.eco-visio.net/Photos/100003096/157...,"48.83511,2.33338"
1,100003096-SC,97 avenue Denfert Rochereau SO-NE,100003096,97 avenue Denfert Rochereau SO-NE,0,2019-08-01T06:00:00+02:00,2012-02-22,https://www.eco-visio.net/Photos/100003096/157...,"48.83511,2.33338"
2,100003096-SC,97 avenue Denfert Rochereau SO-NE,100003096,97 avenue Denfert Rochereau SO-NE,0,2019-08-01T03:00:00+02:00,2012-02-22,https://www.eco-visio.net/Photos/100003096/157...,"48.83511,2.33338"
3,100003096-SC,97 avenue Denfert Rochereau SO-NE,100003096,97 avenue Denfert Rochereau SO-NE,6,2019-08-01T07:00:00+02:00,2012-02-22,https://www.eco-visio.net/Photos/100003096/157...,"48.83511,2.33338"
4,100003096-SC,97 avenue Denfert Rochereau SO-NE,100003096,97 avenue Denfert Rochereau SO-NE,21,2019-08-01T08:00:00+02:00,2012-02-22,https://www.eco-visio.net/Photos/100003096/157...,"48.83511,2.33338"
...,...,...,...,...,...,...,...,...,...
764585,100063173-SC,74 Boulevard Ornano S-N,100063173,74 Boulevard Ornano S-N,0,2020-09-18T15:00:00+02:00,2020-07-22,https://www.eco-visio.net/Photos/100063173/159...,"48.896825,2.345648"
764586,100063173-SC,74 Boulevard Ornano S-N,100063173,74 Boulevard Ornano S-N,0,2020-09-18T19:00:00+02:00,2020-07-22,https://www.eco-visio.net/Photos/100063173/159...,"48.896825,2.345648"
764587,100063173-SC,74 Boulevard Ornano S-N,100063173,74 Boulevard Ornano S-N,0,2020-09-18T20:00:00+02:00,2020-07-22,https://www.eco-visio.net/Photos/100063173/159...,"48.896825,2.345648"
764588,100063173-SC,74 Boulevard Ornano S-N,100063173,74 Boulevard Ornano S-N,0,2020-09-18T21:00:00+02:00,2020-07-22,https://www.eco-visio.net/Photos/100063173/159...,"48.896825,2.345648"


## Columns that we don't need from accidents
* Circulation (143 non-missing values)


# Get postal code of every 'compteur'

In [22]:
qqq=traffic.pivot_table(index='Identifiant du compteur', values='Coordonnées géographiques', aggfunc='head')

Unnamed: 0,Coordonnées géographiques,Identifiant du compteur
0,"48.83511,2.33338",100003096-SC
1,"48.83511,2.33338",100003096-SC
2,"48.83511,2.33338",100003096-SC
3,"48.83511,2.33338",100003096-SC
4,"48.83511,2.33338",100003096-SC
...,...,...
761705,"48.896825,2.345648",100063173-SC
761708,"48.896825,2.345648",100063173-SC
761711,"48.896825,2.345648",100063173-SC
761714,"48.896825,2.345648",100063173-SC


In [23]:
qqq=qqq['Coordonnées géographiques'].str.split(',',expand=True)
qqq=qqq.applymap(float)

In [24]:
import googlemaps
from datetime import datetime

gmaps = googlemaps.Client(key='AIzaSyDY_0cvugGNzNXKCzdP1aYv8r2hV4JdWxE')

In [49]:
postal_codes=qqq.apply(lambda x: gmaps.reverse_geocode((x[0],x[1]))[0]['address_components'][-1]['long_name'], axis=1)

In [51]:
postal_codes=pd.DataFrame(postal_codes, columns=['Postal code'])

Unnamed: 0,Postal code
0,75014
1,75014
2,75014
3,75014
4,75014
...,...
761705,75018
761708,75018
761711,75018
761714,75018


In [54]:
localisations=traffic.merge(postal_codes,left_index=True, right_index=True)
localisations=localisations[['Identifiant du compteur','Nom du compteur','Coordonnées géographiques','Postal code']]

In [58]:
localisations.to_csv('localisations.csv',sep=',',index=False)