# Goals
This project focuses on four transportation modes: metro, RER, train, tramway.

For the project we need 3 main things:

* A database of all stops for each line. We only consider lines for the four aforementioned transporation modes. The database should include: ID, name, position, associated line.
* A database of all lines for the four aforementioned transporation modes. The database should include: ID, name, color in the network map, company, pictogram, geospatial data. The geospatial data should be a contiguous MultiLine. An analysis of the raw data showed that each line graph is not exactly connected, which is a problem.
* Shortest paths for each pair of stops belonging to the same line. This shortest path database will help building the exact path of a route (a route being an ordered sequence of stops).

# Download data from IDFM
We use [PRIM](https://prim.iledefrance-mobilites.fr) data. PRIM is the data hub for IDFM (Ile-de-France mobilité), the administration in charge of public transportation in Paris region.

### Stops and associated lines
This dataset lists all the lines in the Ile-de-France network and the stops served for each of these lines as in GTFS.

Found here: https://prim.iledefrance-mobilites.fr/en/jeux-de-donnees/arrets-lignes?staticDataSlug=arrets-lignes

### Alignments of the Ile-de-France rail network
This dataset is a geographical represenation of the rail network (metro, RER, trains, tramway, etc).

In [2]:
RAW_DATA_PATH="raw_data"

# Data on stops
STOPS_DATA_URL="https://data.iledefrance-mobilites.fr/explore/dataset/arrets-lignes/download/\?format=json"
STOPS_DATA_FILE_PATH="raw_data/stops.json"

# Data on network (GeoJSON routes)
NETWORK_DATA_URL="https://data.iledefrance-mobilites.fr/explore/dataset/traces-du-reseau-ferre-idf/download/\?format=json"
NETWORK_DATA_FILE_PATH="raw_data/network.json"

!mkdir -p RAW_DATA_PATH
!wget $STOPS_DATA_URL -O $STOPS_DATA_FILE_PATH
!wget $NETWORK_DATA_URL -O $NETWORK_DATA_FILE_PATH


--2023-12-16 11:50:22--  https://data.iledefrance-mobilites.fr/explore/dataset/arrets-lignes/download/?format=json
Resolving data.iledefrance-mobilites.fr (data.iledefrance-mobilites.fr)... 34.249.199.226, 34.248.20.69
Connecting to data.iledefrance-mobilites.fr (data.iledefrance-mobilites.fr)|34.249.199.226|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: ‘raw_data/stops.json’

raw_data/stops.json     [    <=>             ]  76,01M  4,51MB/s    in 17s     

2023-12-16 11:50:40 (4,49 MB/s) - ‘raw_data/stops.json’ saved [79698225]

--2023-12-16 11:50:40--  https://data.iledefrance-mobilites.fr/explore/dataset/traces-du-reseau-ferre-idf/download/?format=json
Resolving data.iledefrance-mobilites.fr (data.iledefrance-mobilites.fr)... 34.248.20.69, 34.249.199.226
Connecting to data.iledefrance-mobilites.fr (data.iledefrance-mobilites.fr)|34.248.20.69|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: u

# Load data

In [139]:
import geopandas as gpd
import pandas as pd
import json
from shapely import LineString, Point, MultiPoint

### Network

In [140]:
# Load network json data into a Pandas dataframe
with open(NETWORK_DATA_FILE_PATH, 'r') as data:
    network = pd.json_normalize(json.load(data))
network.head()

Unnamed: 0,datasetid,recordid,record_timestamp,fields.geo_point_2d,fields.shape_leng,fields.reseau,fields.picto_final,fields.idrefliga,fields.colourweb_hexa,fields.mode,...,fields.train,fields.tramway,fields.indice_lig,fields.val,fields.extcode,fields.metro,fields.exploitant,fields.date_mes,geometry.type,geometry.coordinates
0,traces-du-reseau-ferre-idf,ab5cf61b212d033315c862b98c70d6fadd65d59b,2023-12-06T15:35:03.253Z,"[48.89343780342383, 2.4849705516729346]",438.626918,TRAMWAY,https://data.iledefrance-mobilites.fr/explore/...,A01761,dfaf47,TRAMWAY,...,0,1,4,0,800:T4,0,SNCF,2006-11-20T00:00:00Z,Point,"[2.4849705516729346, 48.89343780342383]"
1,traces-du-reseau-ferre-idf,ada7234fa14e174322d0bf3e40918c9563fc5117,2023-12-06T15:35:03.253Z,"[48.64144963803747, 2.444004269027656]",2029.886681,RER D,https://data.iledefrance-mobilites.fr/explore/...,A01842,008b5b,RER,...,0,0,D,0,800:D,0,SNCF,1995-09-24T00:00:00Z,Point,"[2.444004269027656, 48.64144963803747]"
2,traces-du-reseau-ferre-idf,8cab5aae2c1a34085ef1af5a446aef12fb5e9509,2023-12-06T15:35:03.253Z,"[48.65419193389779, 2.425509025813071]",1894.845263,RER D,https://data.iledefrance-mobilites.fr/explore/...,A01842,008b5b,RER,...,0,0,D,0,800:D,0,SNCF,1995-09-24T00:00:00Z,Point,"[2.425509025813071, 48.65419193389779]"
3,traces-du-reseau-ferre-idf,393927eaf13eef99d4efe8ea52561c6a880134b0,2023-12-06T15:35:03.253Z,"[48.83555894233859, 2.2225906308238206]",1658.124659,TRAMWAY,https://data.iledefrance-mobilites.fr/explore/...,A01192,cf009e,TRAMWAY,...,0,1,2,0,100112012:T2,0,RATP,1997-07-02T00:00:00Z,Point,"[2.2225906308238206, 48.83555894233859]"
4,traces-du-reseau-ferre-idf,56d1c4e6f05d4650eed6278145c12d4ac10f5365,2023-12-06T15:35:03.253Z,"[48.72106384794022, 2.252748930215579]",1447.235934,RER B,https://data.iledefrance-mobilites.fr/explore/...,A01857,5091cb,RER,...,0,0,B,0,810:B,0,RATP,1854-07-28T00:00:00Z,Point,"[2.252748930215579, 48.72106384794022]"


In [141]:
# Select relevant fields
network_relevant_fields = {
    "fields.idrefligc": 'short_id',
    "fields.geo_shape.coordinates": 'geometry',
    "fields.res_com": 'name',
    "fields.exploitant": 'company',
    "fields.mode": 'transportation_type',
    "fields.colourweb_hexa": 'color',
    "fields.idf": 'in_idf',
    "fields.picto_final": 'picture_url'
}
network = network[list(network_relevant_fields.keys())]
network = network.rename(columns=network_relevant_fields)

network['short_id'] = network['short_id'].astype('string')
network['name'] = network['name'].astype('string')
network['company'] = network['company'].astype('string')
network['transportation_type'] = network['transportation_type'].astype('string')
network['color'] = network['color'].astype('string')
network['in_idf'] = network['in_idf'].astype('bool')
network['picture_url'] = network['picture_url'].astype('string')

network.head()

Unnamed: 0,short_id,geometry,name,company,transportation_type,color,in_idf,picture_url
0,C01843,"[[2.482032922407471, 48.893805859684164], [2.4...",TRAM 4,SNCF,TRAMWAY,dfaf47,True,https://data.iledefrance-mobilites.fr/explore/...
1,C01728,"[[2.452573392272449, 48.63426831143351], [2.45...",RER D,SNCF,RER,008b5b,True,https://data.iledefrance-mobilites.fr/explore/...
2,C01728,"[[2.435330530181304, 48.64848911165158], [2.43...",RER D,SNCF,RER,008b5b,True,https://data.iledefrance-mobilites.fr/explore/...
3,C01390,"[[2.22531285326707, 48.82901050275979], [2.225...",TRAM 2,RATP,TRAMWAY,cf009e,True,https://data.iledefrance-mobilites.fr/explore/...
4,C01743,"[[2.245672514581905, 48.716946286650035], [2.2...",RER B,RATP,RER,5091cb,True,https://data.iledefrance-mobilites.fr/explore/...


In [142]:
# Only keep select transportation modes
network = network[network.transportation_type.isin(['TRAMWAY', 'RER', 'METRO', 'TRAIN'])]

In [143]:
# Convert to GeoDataFrame
network.geometry = network.geometry.apply(lambda x: LineString(x))
network = gpd.GeoDataFrame(network, geometry='geometry')
network

Unnamed: 0,short_id,geometry,name,company,transportation_type,color,in_idf,picture_url
0,C01843,"LINESTRING (2.48203 48.89381, 2.48369 48.89357...",TRAM 4,SNCF,TRAMWAY,dfaf47,True,https://data.iledefrance-mobilites.fr/explore/...
1,C01728,"LINESTRING (2.45257 48.63427, 2.45168 48.63494...",RER D,SNCF,RER,008b5b,True,https://data.iledefrance-mobilites.fr/explore/...
2,C01728,"LINESTRING (2.43533 48.64849, 2.43430 48.64919...",RER D,SNCF,RER,008b5b,True,https://data.iledefrance-mobilites.fr/explore/...
3,C01390,"LINESTRING (2.22531 48.82901, 2.22524 48.82939...",TRAM 2,RATP,TRAMWAY,cf009e,True,https://data.iledefrance-mobilites.fr/explore/...
4,C01743,"LINESTRING (2.24567 48.71695, 2.24603 48.71739...",RER B,RATP,RER,5091cb,True,https://data.iledefrance-mobilites.fr/explore/...
...,...,...,...,...,...,...,...,...
1638,C02528,"LINESTRING (2.27275 48.76379, 2.26747 48.76450...",TRAM 10,RD Bièvre,TRAMWAY,6e6e00,True,https://data.iledefrance-mobilites.fr/explore/...
1639,C02528,"LINESTRING (2.25317 48.78721, 2.25295 48.78826...",TRAM 10,RD Bièvre,TRAMWAY,6e6e00,True,https://data.iledefrance-mobilites.fr/explore/...
1640,C02529,"LINESTRING (2.35474 48.66789, 2.35571 48.66771...",TRAM 12,SNCF,TRAMWAY,a50034,True,https://data.iledefrance-mobilites.fr/explore/...
1641,C02529,"LINESTRING (2.38340 48.65213, 2.38591 48.64953...",TRAM 12,SNCF,TRAMWAY,a50034,True,https://data.iledefrance-mobilites.fr/explore/...


In [144]:
network.info(memory_usage="deep")

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 1231 entries, 0 to 1642
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype   
---  ------               --------------  -----   
 0   short_id             1231 non-null   string  
 1   geometry             1231 non-null   geometry
 2   name                 1231 non-null   string  
 3   company              1231 non-null   string  
 4   transportation_type  1231 non-null   string  
 5   color                1230 non-null   string  
 6   in_idf               1231 non-null   bool    
 7   picture_url          1231 non-null   string  
dtypes: bool(1), geometry(1), string(6)
memory usage: 608.1 KB


### Stops

In [145]:
# Load stops json data into a Pandas dataframe
with open(STOPS_DATA_FILE_PATH, 'r') as data:
    stops = pd.json_normalize(json.load(data))
stops.head()

Unnamed: 0,datasetid,recordid,fields.operatorname,fields.stop_id,fields.pointgeo,fields.schedules,fields.stop_name,fields.code_insee,fields.id,fields.route_long_name,fields.stop_lat,fields.nom_commune,fields.plans,fields.stop_lon,geometry.type,geometry.coordinates
0,arrets-lignes,9cc2be6bc3b155df2fabfe584654f693bb7431d5,Transdev Cœur Essonne,IDFM:3132,"[48.615568846876435, 2.258913518919905]","[{""routeId"": ""route:IDFM:TRANSDEV_COEUR_ESSONN...",Pierre Curie,91339,IDFM:C00681,DM19,48.61556884687644,Linas,"[{""link"": ""https://prismic-io.s3.amazonaws.com...",2.258913518919905,Point,"[2.258913518919905, 48.615568846876435]"
1,arrets-lignes,9168f84be9233b164e21b489d41fa6f19914eebf,Transdev Cœur Essonne,IDFM:3130,"[48.6098847621596, 2.256199530736612]","[{""routeId"": ""route:IDFM:TRANSDEV_COEUR_ESSONN...",La Ferme,91552,IDFM:C00681,DM19,48.6098847621596,Saint-Germain-lès-Arpajon,"[{""link"": ""https://prismic-io.s3.amazonaws.com...",2.256199530736612,Point,"[2.256199530736612, 48.6098847621596]"
2,arrets-lignes,ec4acc2bf26c3fa2408e2aebf2bd4a8fb1d66d3e,Transdev Cœur Essonne,IDFM:3380,"[48.607236157719356, 2.2555724900803806]","[{""routeId"": ""route:IDFM:TRANSDEV_COEUR_ESSONN...",Tuileries,91552,IDFM:C00681,DM19,48.60723615771936,Saint-Germain-lès-Arpajon,"[{""link"": ""https://prismic-io.s3.amazonaws.com...",2.2555724900803806,Point,"[2.2555724900803806, 48.607236157719356]"
3,arrets-lignes,396089a2bfc73f9322a7d715ffd01551fd7d04a7,Transdev Cœur Essonne,IDFM:3366,"[48.59594770602546, 2.255732937019191]","[{""routeId"": ""route:IDFM:TRANSDEV_COEUR_ESSONN...",Louis Babin,91552,IDFM:C00681,DM19,48.59594770602546,Saint-Germain-lès-Arpajon,"[{""link"": ""https://prismic-io.s3.amazonaws.com...",2.255732937019191,Point,"[2.255732937019191, 48.59594770602546]"
4,arrets-lignes,7d7259fe0617583406f809281ddaa6cfe1ccd291,Transdev Cœur Essonne,IDFM:3363,"[48.59629993737203, 2.2793524084064685]","[{""routeId"": ""route:IDFM:TRANSDEV_COEUR_ESSONN...",Jules Vallès,91552,IDFM:C00681,DM19,48.59629993737203,Saint-Germain-lès-Arpajon,"[{""link"": ""https://prismic-io.s3.amazonaws.com...",2.2793524084064685,Point,"[2.2793524084064685, 48.59629993737203]"


In [146]:
# Select relevant fields
stops_relevant_fields = {
    "fields.stop_id": 'id',
    "fields.stop_lon": 'longitude',
    "fields.stop_lat": 'latitude',
    "fields.stop_name": 'name',
    "fields.id": 'line_id',
    "fields.operatorname": 'company',
}
stops = stops[list(stops_relevant_fields.keys())]
stops = stops.rename(columns=stops_relevant_fields)

stops['id'] = stops['id'].astype("string")
stops['name'] = stops['name'].astype("string")
stops['company'] = stops['company'].astype("string")
stops['line_id'] = stops['line_id'].astype("string")
stops['longitude'] = stops['longitude'].astype(float)
stops['latitude'] = stops['latitude'].astype(float)

stops.head()


Unnamed: 0,id,longitude,latitude,name,line_id,company
0,IDFM:3132,2.258914,48.615569,Pierre Curie,IDFM:C00681,Transdev Cœur Essonne
1,IDFM:3130,2.2562,48.609885,La Ferme,IDFM:C00681,Transdev Cœur Essonne
2,IDFM:3380,2.255572,48.607236,Tuileries,IDFM:C00681,Transdev Cœur Essonne
3,IDFM:3366,2.255733,48.595948,Louis Babin,IDFM:C00681,Transdev Cœur Essonne
4,IDFM:3363,2.279352,48.5963,Jules Vallès,IDFM:C00681,Transdev Cœur Essonne


In [147]:
# Match line ID format with network dataframe
stops['line_short_id'] = stops['line_id'].apply(lambda x: x.split(":")[-1])
stops['line_short_id'] = stops['line_short_id'].astype("string")

In [148]:
# Remove prefix from stop ID
stops['short_id'] = stops['id'].apply(lambda x: x.split(":")[-1])
stops['short_id'] = stops['short_id'].astype("string")

In [149]:
# Only use stops of railroad network (metro, train, tramway)
stops = stops[stops.line_short_id.isin(network.short_id)]

In [150]:
# Convert stops dataframe to a geodataframe
stops = gpd.GeoDataFrame(stops, geometry=gpd.points_from_xy(stops.longitude, stops.latitude))

In [151]:
stops.info(memory_usage="deep")

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 1834 entries, 3620 to 74350
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype   
---  ------         --------------  -----   
 0   id             1834 non-null   string  
 1   longitude      1834 non-null   float64 
 2   latitude       1834 non-null   float64 
 3   name           1834 non-null   string  
 4   line_id        1834 non-null   string  
 5   company        1834 non-null   string  
 6   line_short_id  1834 non-null   string  
 7   short_id       1834 non-null   string  
 8   geometry       1834 non-null   geometry
dtypes: float64(2), geometry(1), string(6)
memory usage: 785.0 KB


# Geospatial analysis

### Compute line graphs

In [152]:
import networkx as nx
import momepy

In [153]:
lines = {a[0]: {'short_id': a[0], 'name': a[1], 'transportation_type': a[2]}
         for a in network[['short_id', 'name', 'transportation_type']].drop_duplicates().values}

# Compute network graph from geospatial data
for line_id in lines:
    line_gdf = network[network.short_id == line_id]
    G = momepy.gdf_to_nx(line_gdf, approach="primal")

    lines[line_id]['graph'] = G
    lines[line_id]['nodes'] = list(G.nodes)

lines


{'C01843': {'short_id': 'C01843',
  'name': 'TRAM 4',
  'transportation_type': 'TRAMWAY',
  'graph': <networkx.classes.multigraph.MultiGraph at 0x182df0bd0>,
  'nodes': [(2.482032922407471, 48.893805859684164),
   (2.487910986261172, 48.893094627698034),
   (2.516509893394615, 48.90727258612873),
   (2.524048525746326, 48.906628324910955),
   (2.555014267014263, 48.904573065543026),
   (2.547921374008685, 48.90473884378429),
   (2.57084155460068, 48.9009555366144),
   (2.561714304042606, 48.897923572022464),
   (2.539173330939577, 48.90841688545126),
   (2.532287442185067, 48.908527016896606),
   (2.546380399428009, 48.907868478658806),
   (2.556069026584471, 48.900919679716836),
   (2.515061446192154, 48.916096503158485),
   (2.516333391307286, 48.921804703583135),
   (2.511939934333314, 48.9026781018502),
   (2.499757185239508, 48.89259977921381),
   (2.506408247006797, 48.89714381844798),
   (2.519119485424034, 48.92650393687579),
   (2.514884565892147, 48.931005670038964),
   (2.49

In [154]:
from itertools import combinations
from shapely import LineString, Point, MultiPoint
from shapely.ops import nearest_points, linemerge

In [156]:
# Network graph is sometimes not connected due to data error
MAX_DISTANCE_BETWEEN_TWO_SUBGRAPHES = 0.005 # 0.001

for line_id in lines:
    line_gdf = network[network.short_id == line_id]
    G = lines[line_id]['graph']
    if not nx.is_connected(G):
        print(f"Graph of line {lines[line_id]['name']} not connected!")
        graph_components = list(nx.connected_components(G))

        # Get pairs of disconnected subgraphes
        segments = []
        for pair in list(combinations(graph_components, 2)):
            x = MultiPoint(list(pair[0]))
            y = MultiPoint(list(pair[1]))
            distance = x.distance(y)

            # Create the shortest segment linking subgraphes nodes
            if distance > 0.0 and distance < MAX_DISTANCE_BETWEEN_TWO_SUBGRAPHES:
                node1, node2 = nearest_points(x, y)
                segment = LineString([node1, node2])
                segments.append(segment)

        new_rows = line_gdf.head(len(segments)).copy()
        new_rows.geometry = segments

        network = pd.concat([network, new_rows])
        line_gdf = network[network.short_id == line_id]

        # Recompute network graph
        G = momepy.gdf_to_nx(line_gdf, approach="primal")
        if nx.is_connected(G):
            print("--- Network graph artificially connected.")
            print(f"--- Added segments: {segments}\n")

            lines[line_id]['graph'] = G
            lines[line_id]['nodes'] = list(G.nodes)

# Show enriched network dataframe
network

Unnamed: 0,short_id,geometry,name,company,transportation_type,color,in_idf,picture_url
0,C01843,"LINESTRING (2.48203 48.89381, 2.48369 48.89357...",TRAM 4,SNCF,TRAMWAY,dfaf47,True,https://data.iledefrance-mobilites.fr/explore/...
1,C01728,"LINESTRING (2.45257 48.63427, 2.45168 48.63494...",RER D,SNCF,RER,008b5b,True,https://data.iledefrance-mobilites.fr/explore/...
2,C01728,"LINESTRING (2.43533 48.64849, 2.43430 48.64919...",RER D,SNCF,RER,008b5b,True,https://data.iledefrance-mobilites.fr/explore/...
3,C01390,"LINESTRING (2.22531 48.82901, 2.22524 48.82939...",TRAM 2,RATP,TRAMWAY,cf009e,True,https://data.iledefrance-mobilites.fr/explore/...
4,C01743,"LINESTRING (2.24567 48.71695, 2.24603 48.71739...",RER B,RATP,RER,5091cb,True,https://data.iledefrance-mobilites.fr/explore/...
...,...,...,...,...,...,...,...,...
1306,C02528,"LINESTRING (2.24965 48.76932, 2.24965 48.76932)",TRAM 10,RD Bièvre,TRAMWAY,6e6e00,True,https://data.iledefrance-mobilites.fr/explore/...
1307,C02528,"LINESTRING (2.25189 48.77984, 2.25003 48.77669)",TRAM 10,RD Bièvre,TRAMWAY,6e6e00,True,https://data.iledefrance-mobilites.fr/explore/...
347,C02529,"LINESTRING (2.29456 48.70224, 2.29456 48.70224)",TRAM 12,SNCF,TRAMWAY,a50034,True,https://data.iledefrance-mobilites.fr/explore/...
348,C02529,"LINESTRING (2.33325 48.67638, 2.33325 48.67638)",TRAM 12,SNCF,TRAMWAY,a50034,True,https://data.iledefrance-mobilites.fr/explore/...
