# Objectives
Analyse the number of cyclists in Paris

* Quantify the rise of cyclists in Paris
    * Get data from traffic, accidents
    * Bike lane construction
* Correlate the accidents with time of day, condition of the road, gender
* Is the increase in car traffic leading to more bike accidents?
* Is the increase bikes lanes helping in the drecrease of bike accidents?
* Is the increase of bike traffic leading to more bike lanes? And in which areas?

# Data that we might need (to scrape)
* Public investment (bike lane construction, public incentives to buy bikes)
* Car Traffic in Paris
* Number of bycicles sold in Paris (we might have info about sales of eletric bikes in Paris)
* Average Salary in Paris
* Average bike prices

In [5]:
import pandas as pd

In [19]:
# Importing the accidents dataset - Imports with no issues
accidents = pd.read_csv('https://www.data.gouv.fr/en/datasets/r/3d5f2317-5afd-4a9f-a9c5-bd4fe0113f39', low_memory=False)

In [35]:
# Importing the bike lanes dataset
# The CSV file is in the wrong format because it doesn't have a delimiter and it is already separated
# Because of that, the read_csv function is having trouble reading the file
# The error_bad_lines parameter ignores the rows that can't read, so those are not imported
# However even after the import, the dataframe is in the wrong format
bike_lanes = pd.read_csv('https://www.data.gouv.fr/en/datasets/r/1211e838-4b77-4ee4-9567-03d78d55f0bf', error_bad_lines=False)
bike_lanes

b'Skipping line 3: expected 10 fields, saw 12\nSkipping line 4: expected 10 fields, saw 30\nSkipping line 14: expected 10 fields, saw 16\nSkipping line 17: expected 10 fields, saw 14\nSkipping line 20: expected 10 fields, saw 14\nSkipping line 30: expected 10 fields, saw 16\nSkipping line 31: expected 10 fields, saw 18\nSkipping line 38: expected 10 fields, saw 28\nSkipping line 41: expected 10 fields, saw 28\nSkipping line 43: expected 10 fields, saw 12\nSkipping line 46: expected 10 fields, saw 20\nSkipping line 57: expected 10 fields, saw 20\nSkipping line 62: expected 10 fields, saw 52\nSkipping line 63: expected 10 fields, saw 36\nSkipping line 65: expected 10 fields, saw 88\nSkipping line 66: expected 10 fields, saw 30\nSkipping line 67: expected 10 fields, saw 12\nSkipping line 68: expected 10 fields, saw 14\nSkipping line 70: expected 10 fields, saw 22\nSkipping line 73: expected 10 fields, saw 12\nSkipping line 75: expected 10 fields, saw 32\nSkipping line 76: expected 10 fiel

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Typologie;Aménagement bidirectionnel;Régime de vitesse;Sens vélo;Voie;Arrondissement;Bois;Longueur du tronçon en m;Longueur du tronçon en km;Position aménagement;Circulation générale interdite;Piste;Couloir bus;Continuité cyclable;Réseau cyclable;Date de livraison;geo_shape;geo_point_2d
"Pistes cyclables;Non;Voie 50;Sens de circulation générale;BOULEVARD DE CLICHY;18;Non;180.64181506;0.18064181506;Axial;;Niveau trottoir;;;;2008-12-31;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.332159877022311",48.88398560484707],[2.332092567864132,48.88397864357834],[2.32989326709266,48.884493815552354],[2.32985417476428,"48.88452487546856]]}"";48.8842347322",2.331002
"Pistes cyclables;Non;Zone 30;Sens de circulation générale;AVENUE EDISON;13;Non;34.13963197;0.03413963197;;;Niveau chaussée;;;;2010-12-31;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.357357688961302",48.82991912797357],[2.357772560325224,"48.83005785241381]]}"";48.8299884902",2.35756512464,,,,
"Couloirs de bus ouverts aux vélos;Non;Voie 50;Sens de circulation générale;RUE AUBER;9;Non;97.62376718;0.09762376718;;;;Marqué;;;;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.330801362199802",48.87160065081683],[2.3318071671570753,"48.871025719886866]]}"";48.8713131854",2.33130426468,,,,
"Couloirs de bus ouverts aux vélos;Non;Voie 50;Sens de circulation générale;RUE AUBER;9;Non;73.24238005;0.07324238005;;;;Marqué;;;;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.327343317178589",48.87353490716095],[2.328107874434441,"48.87311123581174]]}"";48.8733230715",2.32772559581,,,,
"Couloirs de bus ouverts aux vélos;Non;Voie 30;Sens de circulation générale;RUE DE TOLBIAC;13;Non;124.68220166;0.12468220166;;;;Marqué;;;;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.3570270706684973",48.82614657008793],[2.355334714610843,"48.82605298256184]]}"";48.8260997763",2.35618089264,,,,
...,...,...,...,...,...,...,...,...,...
"Couloirs de bus ouverts aux vélos;Non;Voie 50;Sens de circulation générale;Boulevard Malesherbes;17;Non;39.27490614;0.039274906139999996;;;;Marqué;;;;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.303996917924316",48.889461820604865],[2.304100050407964,"48.88911523309067]]}"";48.8892885268",2.30404848417,,,,
"Pistes cyclables;Oui;Voie 50;Contresens;Quai de la Tournelle;5;Non;9.80468015;0.00980468015;Latéral;Non;Niveau chaussée;;;REVe;2019-03-22;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.356651794522714",48.84947663265006],[2.35653094669135,"48.84951423869951]]}"";48.8494954357",2.35659137061,,,,
"Autres itinéraires cyclables (ex : Aires piétonnes - Contre-sens cyclables);;Zone 30;Contresens;Rue de l'Amiral Mouchez;14;;33.98472836;0.03398472836;;;;;;;2020-03-31;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.341877331710453",48.82228318367873],[2.34190815697923,48.82226788579296],[2.342062805477563,"48.82200697946991]]}"";48.8221515053",2.34197603132,,
"Autres itinéraires cyclables (ex : Aires piétonnes - Contre-sens cyclables);Non;Voie 30;Contresens;Rue Danielle Casanova;1;Non;252.3726255;0.2523726255;;Non;;;;;2017-05-30;""{""""type"""": """"LineString""""","""""coordinates"""": [[2.330259137482241",48.868303833220686],[2.330414982069848,48.868245692929705],[2.33240461094887,48.867790897519434],[2.333497068905588,"48.86754237724207]]}"";48.8679127173",2.331875


In [30]:
# Importing the traffic dataset
# Same problem as the bike lane dataset. Even though it doesn't give an error when reading the file,
# the dataframe is in the wrong format as well.
traffic = pd.read_csv('https://www.data.gouv.fr/en/datasets/r/237382af-0e7a-4ef8-9508-b3e9e78adcfd')

## Columns that we don't need from accidents
* Circulation (143 non-missing values)
