# Step 2: Clean Network

This is a further post-processing function now found within the network_clean GOSTnets submodule. This function cleans the network by removing excessive nodes, and ensures all edges are bi-directional (except in the case of one-way roads).

WARNING: The Clean Network function is a computationally expensive function, so it may take a while to run. It outputs a pickled graph object, a dataframe of the edges, and a dataframe of the nodes. The expectation is that this will only have to be run once.

In [None]:
import os
import sys
import time
import networkx as nx

In [None]:
# add to your system path the location of the LoadOSM.py and GOSTnet.py scripts
sys.path.append("../")
import GOSTnets as gn

In [None]:
pth = "./"  # change this path to your working folder
data_pth = os.path.join(pth, "tutorial_outputs")

# read back your graph from step 1 from you saved pickle
G = nx.read_gpickle(os.path.join(data_pth, "iceland_unclean.pickle"))

In [None]:
# inspect the graph
nodes = list(G.nodes(data=True))
edges = list(G.edges(data=True))
print(len(nodes))
print(nodes[0])
print(len(edges))
print(edges[0])

In [None]:
# you can also print general graph information with networkx
print(nx.info(G))

In [None]:
# To become familiar with the function read the doc string
gn.clean_network?

Set up some parameters for the CleanNetwork function

In [None]:
Iceland_UTMZ = {"init": "epsg:32627"}

WGS = {"init": "epsg:4326"}  # do not adjust. OSM natively comes in ESPG 4326

Run the CleanNetwork Function.  
Changing verbose to True will write the outputs in the specified wpath.

In [None]:
print("start: %s\n" % time.ctime())
G_clean = gn.clean_network(
    G, UTM=Iceland_UTMZ, WGS={"init": "epsg:4326"}, junctdist=10, verbose=False
)

# using verbose = True:
# G_clean = gn.clean_network(G, wpath = data_pth, output_file_name = 'iceland_network', UTM = Iceland_UTMZ, WGS = {'init': 'epsg:4326'}, junctdist = 10, verbose = True)
print("\nend: %s" % time.ctime())
print("\n--- processing complete")

In [None]:
# let's print info on our clean version
print(nx.info(G_clean))

The clean_network function helps snapping points that are very close to one another. However, it does not conduct any check on whether the network is fully connected.

## Optional step: Only use the largest sub-graph
Network analysis is often done on only connected graphs. Disconnected graphs can result in paths that cannot reach their destination. Also, you can evaluate how connected your network is and have the option of going back and making more edits.

In [None]:
# Identify only the largest graph

# compatible with NetworkX 2.4
list_of_subgraphs = list(
    G_clean.subgraph(c).copy() for c in nx.strongly_connected_components(G_clean)
)
max_graph = None
max_edges = 0
for i in list_of_subgraphs:
    if i.number_of_edges() > max_edges:
        max_edges = i.number_of_edges()
        max_graph = i

# set your graph equal to the largest sub-graph
G_largest = max_graph

In [None]:
# print info about the largest sub-graph
print(nx.info(G_largest))

The majority of the network was captured by the largest subgraph. That's pretty good. It means the quality of OSM data for this city is quite good.  

Save this prepared graph in your output folder: 

In [None]:
gn.save(G_largest, "iceland_network_clean", data_pth)

How many subgraphs would you guess there are?

In [None]:
len(list_of_subgraphs)

Move on to Step 3 to see how we can use this network for some travel time analysis!

#### Optional: Compare networks (original / clean-version / largest subgraph)

OSMNX is one of the key libraries that GostNETS is based on. Here, we load it to access graph-plotting functions.

In [None]:
import osmnx as ox

In [None]:
# plotting functions only work if the graphs have a name and a crs attribute
G.graph["crs"] = "epsg:32646"
G.graph["name"] = "Iceland"

# original graph
ox.plot_graph(G, fig_width=10, edge_linewidth=1, node_size=7)

In [None]:
G_clean.graph["crs"] = "epsg:32646"
G_clean.graph["name"] = "Iceland"

# cleaned graph
ox.plot_graph(G_clean, fig_width=10, edge_linewidth=1, node_size=7)

In [None]:
G_largest.graph["crs"] = "epsg:32646"
G_largest.graph["name"] = "Iceland"

# largest subgraph
ox.plot_graph(G_largest, fig_width=10, edge_linewidth=1, node_size=7)