### Step 1 - Extract from OSM
The purpose of this script is to show how to go from a .osm.pbf file to a network format that GOSTnets will accept. 
Additional modifications in this script include: clipping the roads to an input polygon), and slicing the roads DataFrame by a list of accepted road types, as tagged in OSM (see accepted_road_types list). 

This notebook is slightly messy as it contains commented out code used for checking intermediate outputs.

In [1]:
import geopandas as gpd
import pandas as pd
import os, sys

# sys.path.append("../../../GOSTnets/GOSTnets")

import GOSTnets as gn
from GOSTnets.load_osm import *
import importlib
import networkx as nx
import osmnx as ox
from shapely import ops as ops
from shapely.ops import unary_union
from shapely.wkt import loads
from shapely.geometry import LineString, MultiLineString, Point

Define filepaths

In [4]:
input_pth = r'P:\BGD\GEO'
lcl_input_pth = r'inputs'
interm_pth = r'intermediate'
fin_pth = r'final'

osm_fil = r'bangladesh_210329_osm.pbf'

f = os.path.join(input_pth,'OSM',osm_fil) 

Define parameters

In [5]:
simplif_meters = 25 # a bit of a manual process, higher number = simpler but more chance for errors
target_crs = 3106 # using Gulshan 303 / Bangladesh Traverse Mercator as it's a national metric projection with an established EPSG

In [6]:
# Production date for outputs being used

# prod_date = datetime.today().strftime('%y%m%d')
# prod_date = '210312'
prod_date = '210329'


### Load in OSM data

Using the loadOSM library, pass the filepath of the .osm.pbf to initiate a OSM_to_network class

In [6]:
# includeFerries must be set to true

BD = OSM_to_network(f, includeFerries=True)

  return _prepare_from_string(" ".join(pjargs))


This generates a property of the class called 'roads_raw'. This is a raw dataframe of the highway objects, extracted from the .osm.pbf. This is the starting point for our network

In [7]:
BD.roads_raw.infra_type.value_counts()

residential       124210
unclassified       88481
path               35363
track              26752
living_street      22204
tertiary           15674
service             8415
footway             7347
road                4422
secondary           3510
trunk               2222
primary             2037
steps                653
pier                 540
pedestrian           488
ferry                252
trunk_link           185
primary_link         112
secondary_link        76
construction          40
cycleway              36
tertiary_link         35
corridor              15
yes                   14
bridleway             11
services               4
platform               2
motorway               2
rest_area              2
crossing               1
bus                    1
P                      1
Name: infra_type, dtype: int64

In [8]:
# make sure to include ferry and pier values

accepted_road_types = ['pier','ferry','track','unclassified','road','service','residential', 'living_street', \
                       'tertiary','secondary','primary','trunk','motorway',\
                       'tertiary_link','secondary_link','primary_link','trunk_link','motorway_link']

We call the filerRoads method and pass it a list of acceptable road types

In [9]:
BD.filterRoads(acceptedRoads = accepted_road_types)

In [10]:
BD.roads_raw.infra_type.value_counts()

residential       124210
unclassified       88481
track              26752
living_street      22204
tertiary           15674
service             8415
road                4422
secondary           3510
trunk               2222
primary             2037
pier                 540
ferry                252
trunk_link           185
primary_link         112
secondary_link        76
tertiary_link         35
motorway               2
Name: infra_type, dtype: int64

With initialReadIn(), we transform this to a graph object

In [11]:
G = BD.initialReadIn()

In [12]:
# Optionally, we save this graph object down to file using gn.save()

gn.save(BD.network,'BD_raw_{}m_{}'.format(simplif_meters,prod_date),'intermediate')

### 2 - Cleaning

This function defines the order of GOSTnet functions we will call on the input network object. The verbose flag causes the process to save down intermediate files - helpful for troublehsooting.

In [13]:
G = nx.read_gpickle(os.path.join(interm_pth, 'BD_raw_{}m_{}.pickle'.format(simplif_meters,prod_date)))

Optional -- filter out zero length edges

In [14]:
# from shapely.geometry import shape
# import json

# for u, v, data in G.edges(data=True):
#     data['real_length'] = data['Wkt'].length
    
# for u, v, data in G.edges(data='real_length'):
#     if data == 0:
#         G.remove_edge(u,v)

Now clean

In [None]:
print('start: %s\n' % time.ctime())
G_clean = gn.clean_network(G, \
                               wpath=interm_pth, \
                               output_file_name='BD',\
                               UTM = {'init':'epsg:{}'.format(target_crs)}, \
                               WGS = {'init':'epsg:4326'},\
                               junctdist = simplif_meters)
print('\nend: %s' % time.ctime())
print('\n--- processing complete')

start: Mon Mar 29 16:26:03 2021



  return _prepare_from_string(" ".join(pjargs))


In [None]:
gn.save(G_clean,'BD_processed_{}m_{}'.format(simplif_meters,prod_date),interm_pth)

### 3 - Find Largest Graph

Here, we generate shapefiles of the connected network and those roads which are disconnected. Though not necessary for the analysis, this is a useful process to go through to:
1.) visually appraise the quality of the OSM network
2.) identify large subgraphs that need to be manually connected to the main network
3.) support network improvement activities 

Shapefiles manually edited (as per step 2) can be loaded up in a separate optional process outlined in the next notebok

Import Processed Graph

In [7]:
G_clean = nx.read_gpickle(os.path.join(interm_pth,'BD_processed_{}m_{}.pickle').format(simplif_meters,prod_date))

Add a unique value to each edge

In [8]:
q = 0
for u, v, data in G_clean.edges(data = True):
    data['unique_id'] = q
    q+=1

Identify largest subgraph by making a list of all subgraphs, iterating through them, and setting a variable to the maximum

In [9]:
list_of_graphs = list(G_clean.subgraph(c).copy() for c in nx.strongly_connected_components(G_clean))

In [10]:
max_edges = 0
for q in range(0, len(list_of_graphs)):
        g = list_of_graphs[q]
        if g.number_of_edges() > max_edges:
            max_edges = g.number_of_edges()
            t = q
        else:
            pass

Print the results of this test

In [None]:
print("The largest graphs is the graph in position %s, and has %s edges" % (t, max_edges))

Define largest graph as its own object, save it down

In [None]:
# Export graph

largest_G = list_of_graphs[t]
gn.save(largest_G, 'largest_G_{}m_{}'.format(simplif_meters,prod_date), interm_pth)

# Turn into shapefile

edge_gdf_largest_G = gn.edge_gdf_from_graph(largest_G)
edge_gdf_largest_G = edge_gdf_largest_G.drop('geometry', axis = 1)
edge_gdf_largest_G['Wkt'] = edge_gdf_largest_G['Wkt'].apply(lambda x: gn.unbundle_geometry(x))
edge_gdf_largest_G = edge_gdf_largest_G.set_geometry('Wkt')
edge_gdf_largest_G.to_file(os.path.join(interm_pth,'LargestG.shp'))

In [16]:
# # Export nodes

# largest_G = list_of_graphs[t]

# node_gdf_largest_G = gn.node_gdf_from_graph(largest_G)
# node_gdf_largest_G = node_gdf_largest_G.set_geometry('geometry')
# node_gdf_largest_G.to_file(os.path.join(interm_pth,'LargestG_nodes.shp'))

Create a shapefile of all the edges that aren't in the main graph

In [None]:
edge_gdf = gn.edge_gdf_from_graph(G_clean)

edges_in_largest_G = list(edge_gdf_largest_G.unique_id)

edges_NOT_in_largest_G = edge_gdf.loc[~edge_gdf.unique_id.isin(edges_in_largest_G)]

Save it down

In [None]:
edges_NOT_in_largest_G = edges_NOT_in_largest_G.drop('geometry', axis = 1)
edges_NOT_in_largest_G['Wkt'] = edges_NOT_in_largest_G['Wkt'].apply(lambda x: gn.unbundle_geometry(x))
edges_NOT_in_largest_G = edges_NOT_in_largest_G.set_geometry('Wkt')
edges_NOT_in_largest_G.to_file(os.path.join(interm_pth,'DisconnectedRoads.shp'))

*Optional troubleshooting exports*

Create a shapefile of all edges from the cleaned graph, save it down

In [None]:
# all_edges_G = gn.edge_gdf_from_graph(G_clean)
# all_edges_G = all_edges_G.drop('geometry', axis = 1)
# all_edges_G['Wkt'] = all_edges_G['Wkt'].apply(lambda x: gn.unbundle_geometry(x))
# all_edges_G = all_edges_G.set_geometry('Wkt')
# all_edges_G.to_file(os.path.join(interm_pth,'All_edges_clean.shp'))

Create a shapefile of all edges from the UNcleaned graph, save it down

In [None]:
# all_edges_G = gn.edge_gdf_from_graph(G)
# all_edges_G = all_edges_G.drop('geometry', axis = 1)
# all_edges_G['Wkt'] = all_edges_G['Wkt'].apply(lambda x: gn.unbundle_geometry(x))
# all_edges_G = all_edges_G.set_geometry('Wkt')
# all_edges_G.to_file(os.path.join(interm_pth,'All_edges_Gnormal.shp'))

In [None]:
gn.save(G_clean,'BD_clean_{}m'.format(simplif_meters),interm_pth)

### 4 - Convert to speeds

In [23]:
G_largest = nx.read_gpickle(os.path.join(interm_pth, 'largest_G_{}m_{}.pickle').format(simplif_meters,prod_date))

Next, we convert the network to time, supplying a speed in km/h for each road type

Define a speed dictionary with a value for every unique road type in the above list

In [24]:
# # adjusted for Bangladesh based on Blankespoor and Yoshida (2010)
# # Reduced all by 5 to account for BD traffic congestion and poor road conditions

# speeds = {'ferry':8,
#             'pier':5,
#             'path':5,
#             'track':15,
#             'service':15,
#             'living_street':20,
#             'road':20,
#             'residential':20,
#             'unclassified':20,
#             'tertiary':35,
#             'secondary':45,
#             'primary':55,
#             'trunk':65,
#             'motorway':80,
#             'tertiary_link':25,
#             'secondary_link':35,
#             'primary_link':50,
#             'trunk_link':60,
#             'motorway_link':70}

In [25]:
# Radically reduced speeds based on Dappe et al (2019)
# Ferry speeds are extremely low to account for frequency and length of delays at ferry crossings noted in Dappe et al (2019)

speeds = {  'ferry':2, 
            'pier':5,
            'path':5,
            'track':15,
            'service':15,
            'living_street':15,
            'road':15,
            'residential':15,
            'unclassified':15,
            'tertiary':20,
            'secondary':20,
            'primary':20,
            'trunk':20,
            'motorway':20,
            'tertiary_link':20,
            'secondary_link':20,
            'primary_link':20,
            'trunk_link':20,
            'motorway_link':20}

In [26]:
G_clean_time = gn.convert_network_to_time(G_largest, # G_salted or G_clean
                                      distance_tag = 'length',
                                      graph_type = 'drive',
                                      road_col = 'infra_type',
                                      speed_dict = speeds,
                                      factor = 1000
                                     )

Finally, before saving down, we reset all node IDs to integers to aid the graphtool step

In [None]:
??nx.convert_labels_to_integers


In [27]:
G_clean_time = nx.convert_node_labels_to_integers(G_clean_time) # G_clean vs G_salted

Save down

In [2]:
gn.save(G_clean_time, f'final_current_G_{simplif_meters}m_{prod_date}', 'final', nodes = True,  edges = True)