### Step 3 - Final Net Preparation

In this script we take the cleaned road network and finish our preparations ahead of the OD matrix generation. We do the following: 
- Subset the graph to the largest two subgraphs (one for the mainland, and one for Socotra)
- Salt the network to 2km (break up edges longer than 2km into 2km chunks)
- Add travel time for each edge
- Add a particular road which is missing but clearly visible from satellite imagery
- Perform snapping of both origins and destinations, and send copies to the graphtool folder

Import the usual suspects

In [2]:
import os, sys, time
sys.path.append(r'C:\Users\charl\Documents\GitHub\GOST_PublicGoods\GOSTNets\GOSTNets')
import importlib
import GOSTnet as gn
importlib.reload(gn)
import pandas as pd
import geopandas as gpd
import numpy as np
from shapely.wkt import loads
from shapely.geometry import Point, LineString, MultiLineString
import networkx as nx
import osmnx as ox

networkx version: 2.1 
osmnx version: 0.7.4 
networkx version: 2.1 
osmnx version: 0.7.4 


Set the folder paths and filenames that we will use

In [3]:
# network path and network name. npickle must be a gpickle type object
npath = r'C:\Users\charl\Documents\GOST\Yemen\YEM\Round 3\output'
npickle = 'YEM_processed.pickle'

# basepath
bpath = r'C:\Users\charl\Documents\GOST\Yemen'

# origin path 
opath = os.path.join(bpath, 'origins')

# destinations path
dpath = os.path.join(bpath, 'facility_files')

# write path for outputs
wpath = r'C:\Users\charl\Documents\GOST\Yemen\YEM\Round 3\graphtool'

### Loads saved graph from pickle
In this section we take the largest two subgraphs of the network (for the mainland and the island of Socotra respectively) and keep only these roads going forward

In [6]:
# load in our graph as G
G = nx.read_gpickle(os.path.join(npath, npickle))

# make a big list of all the strongly connected components in G
list_of_subgraphs = list(nx.strongly_connected_component_subgraphs(G))

# G_2 is our biggest graph. We abstract him separately as our basegraph
G_2 = list_of_subgraphs[0]

# out list of small subgraphs is everything else 
list_of_small_subgraphs = list_of_subgraphs[1:]

# we open some empty lists for the nodes and edges of the small subgraphs that we want to retain
node_bunch = []
edge_bunch = []

# moving through each subgraph, 
for subgraph in list_of_small_subgraphs:
    
    # if the subgraphs have more than 50 nodes (and we aren't for some reason looking at G_2 our big boy)
    if subgraph.number_of_nodes() > 50 and subgraph.number_of_nodes() != G_2.number_of_nodes():
        
        # add the nodes to the nodebunch
        for u, data in subgraph.nodes(data = True):
            node_bunch.append((u, data))
        
        # add the edges to the edgebunch
        for u, v, data in subgraph.edges(data = True):
            edge_bunch.append((u, v, data))

# add to G_2, our big base boy, the additional edges            
G_2.add_nodes_from(node_bunch)
G_2.add_edges_from(edge_bunch)

# reset G, our graph, to our modified G_2
G = G_2

# print edge count and save down
print(G.number_of_edges())
gn.save(G, 'G', wpath)

290862


### Network Salting
Here, we break up long edges of the graph into sections of length no more than 'thresh'.
We do this to significantly improve the accuracy of snapping points off-network on to the network at the closest point. The GOSTnets function 'salt_long_lines' automatically makes these changes to the network for us

In [22]:
# call the sale long lines function which does the hard work for us
G_salty = gn.salt_long_lines(G, 
                             'epsg:4326', 
                             'epsg:32638', 
                             thresh = 2000, 
                             factor = 1000, 
                             attr_list = ['infra_type','id','country','osm_id','Type'])

# reconvert the nodes IDs to intergers (just useful to do from time to time, and v quick)
G_salty = nx.convert_node_labels_to_integers(G_salty)

# this is important. We want the salting process to have not disrupted connectivity! Check that here. 
print('check: salting process has left number of connected components unchanged')
print(len(list(nx.strongly_connected_component_subgraphs(G))),
      ' | ', 
      len(list(nx.strongly_connected_component_subgraphs(G_salty))))

Identified 6063 unique edge(s) longer than 2000. 
Beginning new node creation...
44286 new edges added and 12125 removed to bring total edges to 323023
16080 new nodes added to bring total nodes to 123976
check: salting process has left number of connected components unchanged
2  |  2


### Adding Travel Time for each edge

The travel time can be added to the graph by correctly calling GOSTnets' convert_network_to_time() function with a speed limit dictionary, detailing the choice of speed limits for each type of road. As Yemen is in pretty bad shape, these speeds are kept low (all values in km/h)

In [23]:
print('adding traverse time edge property...')
# define speed limit dictionary
speed_dict = {
    'residential':25,
    'unclassified':15,
    'track':15,
    'tertiary':40,
    'secondary':50,
    'primary':60,
    'trunk':50,
    'service':15,
    'road':15,
    'trunk_link':50,
    'secondary_link':50,
    'primary_link':60,
    'tertiary_link':40}

# add traverse time property into 'time' edge attribute 
G_salty_time = gn.convert_network_to_time(G_salty, 
                                          distance_tag = 'length', 
                                          road_col = 'infra_type', 
                                          graph_type = 'drive', 
                                          speed_dict = speed_dict, 
                                          walk_speed = 4,
                                          factor = 1000
                                         )

# print out an example edge to check that it has worked as intended
gn.example_edge(G_salty_time, 2)

# save down the resultant graph with a name that describes the processes that have been done to it
gn.save(G_salty_time, 'G_salty_time', wpath)

adding traverse time edge property...
(0, 35636, {'Wkt': 'LINESTRING (44.2165745 15.3646484, 44.2167032 15.3659494, 44.2167203 15.3663506, 44.2167763 15.3673335, 44.2167973 15.3681592)', 'id': 24386, 'infra_type': 'secondary', 'osm_id': 108470243, 'country': 'YEM', 'key': 'edge_24386', 'length': 389.3715222365589, 'Type': 'legitimate', 'time': 28.03474960103224, 'mode': 'drive'})
(0, 74084, {'Wkt': 'LINESTRING (44.2167973 15.3681592, 44.2168027 15.3690043, 44.2168968 15.3701596)', 'id': 24387, 'infra_type': 'secondary', 'osm_id': 108470243, 'country': 'YEM', 'key': 'edge_24387', 'length': 221.74928659443168, 'Type': 'legitimate', 'time': 15.96594863479908, 'mode': 'drive'})


### Manual Addition of Missing Roads
In the specific Yemen context, the border cuts off one crucial road link in the desert towards the northernmost border. Here, we manually re-add this road back to the network

In [24]:
missed_edges = []

# NOTE: these start and end node IDs were identified by manual observation of the road network in QGIS
st_node = 21793
end_node = 114778
st_point = Point(G_salty_time.nodes()[st_node]['x'],G_salty_time.nodes()[st_node]['y'])
end_point = Point(G_salty_time.nodes()[end_node]['x'],G_salty_time.nodes()[end_node]['y'])
lin = LineString([st_point,end_point])

# again, a measurement made in QGIS
real_length = 115

data = {'Wkt':lin,
       'id':max(nx.get_edge_attributes(G_salty_time,'id').values())+1,
       'infra_type':'service',
       'country':'YEM',
       'key':'manual_edge_1',
       'length':real_length, 
       'Type':'manual_edge',
       'time':float((real_length / 15)), # we assume a speed of 15km/h across this road
       'mode':'drive'}

# we need to add a bidirectional edge - so we add the same data, but flip the start and end node positions
missed_edges.append((st_node, end_node, data))
missed_edges.append((end_node, st_node, data))

# add on the missed edges
G_salty_time.add_edges_from(missed_edges)

# re-ID the nodes
G_salty = nx.convert_node_labels_to_integers(G_salty)

# save
gn.save(G_salty_time, 'G_salty_time', wpath)

### Snap Destination Files to the Network
standard call of GOSTnets' Pandana Snap. soem controls are done on the GPS coordinates to ensure that no facilities with wildly off GPS coordinates are included.

In [44]:
dfiles = ['HeRAMS 2018 April.csv']

for dfile in dfiles:
    
    # Read in
    dest_df = pd.read_csv(os.path.join(os.path.join(dpath, dfile)), encoding = "ISO-8859-1")
    
    # Ensure coordinates are floats
    dest_df.Longitude = dest_df.Longitude.astype(float)
    dest_df.Latitude = dest_df.Latitude.astype(float)
    
    # Drop entries with no coordinates    
    dest_df2 = dest_df.copy()
    dest_df2 = dest_df2.loc[(dest_df2.Longitude != 0)]
    dest_df2 = dest_df2.loc[(dest_df2.Longitude != None)]
    print(len(dest_df))
    
    # drop all hospitals with GPS coordiantes outside Yemen / outside this mortal plane
    dest_df2 = dest_df2.loc[(dest_df2.Longitude <= 60)]
    dest_df2 = dest_df2.loc[(dest_df2.Longitude >= 35)]
    dest_df2 = dest_df2.loc[(dest_df2.Latitude <= 30)]
    dest_df2 = dest_df2.loc[(dest_df2.Latitude >= 5)]
    dest_df = dest_df2
    print(len(dest_df))
    
    # Generate Geometries
    dest_df['geometry'] = list(zip(dest_df.Longitude, dest_df.Latitude))
    dest_df['geometry'] = dest_df['geometry'].apply(Point)
    dest_df = gpd.GeoDataFrame(dest_df, geometry = 'geometry', crs = {'init':'espg:4326'})
    
    # Perform snap
    time.ctime()
    start = time.time()
    df = gn.pandana_snap(G_salty_time, dest_df, 'epsg:4326','epsg:32638', add_dist_to_node_col = True)
    
    # Save to file
    df.to_csv(os.path.join(wpath, dfile.replace('.csv', '_snapped.csv')))
    df.to_csv(os.path.join(dpath, dfile.replace('.csv', '_snapped.csv')))

    print('time elapsed: %d seconds' % (time.time() - start))

5042
5042
4411
time elapsed: 12 seconds


### Snap Origin Points to Networks
This time, we snap the origins to the network. This takes a lot longer (there are far more origins than destinations!). Be patient - this only has to be done once for each file in the workflow. 

In [11]:
ofile_A = r'origins_1km_2015.csv'
ofile_B = r'origins_1km_2018.csv'
ofiles = [ofile_A,ofile_B]

for ofile in ofiles:
    
    # Read in
    dest_df = pd.read_csv(os.path.join(os.path.join(opath, ofile)), encoding = "ISO-8859-1")
    dest_df['geometry'] = dest_df['geometry'].apply(loads)
    dest_df = gpd.GeoDataFrame(dest_df, geometry = 'geometry', crs = {'init':'espg:4326'})
    
    # Perform snap
    print('Beginning snap')
    
    time.ctime()
    start = time.time()
    df = gn.pandana_snap(G_salty_time, dest_df, 'epsg:4326','epsg:32638', add_dist_to_node_col = True)
    
    # Save to file
    df.to_csv(os.path.join(wpath, ofile.replace('.csv', '_snapped.csv')))
    df.to_csv(os.path.join(opath, ofile.replace('.csv', '_snapped.csv')))


    print('Time elapsed: %d seconds' % (time.time() - start))

Beginning snap
Time elapsed: 161 seconds
Beginning snap
Time elapsed: 164 seconds
