<center><img src=img/MScAI_brand.png width=70%></center>

# NetworkX

NetworkX https://networkx.github.io/ is a nice library of graph algorithms in Python. It can be installed via Anaconda.

We previously saw several possible representations for graphs which can be used in computer programs:

* Adjacency matrix
* List of edges
* Adjacency lists


The adjacency list format is like this. For each node, we store its list of neighbours:

```python
G = {
    0: [1, 2, 3], 
    1: [0, 3], 
    2: [0], 
    3: [0, 1], 
    4: []
}
```


Next, we'll see that the NetworkX representation is just an extension of this, allowing for each edge to have extra properties such as weights.

In [2]:
G = {
    0: {1: {"w": 0.5}, 2: {"w": 0.1}, 3: {"w": 0.1}},
    1: {0: {"w": 0.5}, 3: {"w": 0.3}},
    2: {0: {"w": 0.1}},
    3: {0: {"w": 0.1}, 1: {"w": 0.3}},
    4: {}
}

One way to make a graph in NetworkX is just to build it up by adding nodes and then edges, as follows. (Later we'll see how to read graphs in.)

In [3]:
import networkx as nx # conventional import
G = nx.Graph()
G.add_nodes_from(range(5))

In [4]:
G.nodes()

NodeView((0, 1, 2, 3, 4))

In [5]:
G.edges()

EdgeView([])

In [6]:
G.add_edge(0, 1, w=0.5)
G.add_edge(0, 2, w=0.1)
G.add_edge(0, 3, w=0.1)
G.add_edge(1, 3, w=0.3)

In [7]:
G.edges()

EdgeView([(0, 1), (0, 2), (0, 3), (1, 3)])

In [8]:
G.edges(data=True)

EdgeDataView([(0, 1, {'w': 0.5}), (0, 2, {'w': 0.1}), (0, 3, {'w': 0.1}), (1, 3, {'w': 0.3})])

And now, notice that `G` itself functions as a `dict`, mapping from a node to its adjacency list **with edge properties**:

In [9]:
for node in G.nodes():
    print(node, ":", G[node])

0 : {1: {'w': 0.5}, 2: {'w': 0.1}, 3: {'w': 0.1}}
1 : {0: {'w': 0.5}, 3: {'w': 0.3}}
2 : {0: {'w': 0.1}}
3 : {0: {'w': 0.1}, 1: {'w': 0.3}}
4 : {}


The nodes don't have to be integers! They can be strings, floats, or arbitrary (hashable) Python objects. 

(Remember mutable objects such as lists are not hashable and cannot be stored in dictionaries.)

* `G = nx.Graph() # default, undirected`
* `G = nx.DiGraph() # directed graph`

### Algorithms

* Creating standard graphs and random graphs
* Breadth-first and depth-first traversal
* Connectivity
* Communities, cliques, clusters, etc.
* Isomorphism (are two graphs the same if we ignore labels)

### More algorithms

* Centrality e.g. PageRank and related algorithms
* Cycles
* Shortest paths
* Max-flow/min-cut
* Diameter
* Linear algebra methods on adjacency matrices


### Exercise
We'll see how to read in a graph to NetworkX from a plain-text format, and run some algorithms on it. The scenario is: on a small island, we are just about to finish building a power plant at town 0. All the towns are already connected by a road network. The Minister has decided to build the electricity network along the roads. We want to achieve connectivity to all towns at minimum cost. Our data is stored in `data/power_plant_edgelist.csv`. 

1. As we can see in the first few lines below, it is just an edge-list with weights. Why are we not worried about the possibility of isolated nodes, not captured by the edge list?

In [10]:
import networkx as nx
fname = "data/power_plant_edgelist.csv"
f = open(fname)
for i in range(10):
    print(f.readline(), end="")
f.close()

# This file contains the road structure
# on a small island. Towns are
# at nodes 0-13. Each road segment is
# notated as:
# town town distance
0 1 0.200000
0 2 0.223607
0 3 0.500000
0 4 0.282843
0 5 0.500000


### Exercises

2. Read in this file using `nx.read_edgelist`. It should be an undirected, weighted graph.

3. Confirm that the road network is fully connected. Use `nx.is_connected()`.
4. Use Kruskal's Minimum Spanning Tree to find the lowest-cost electricity network.
5. The Minister is planning to take a drive in her Mercedes when she comes to cut the ribbon. She will travel from the power plant to the most distant village, using the fastest route. What is that distance?

### Solutions

Use `nx.read_edgelist?` to get help. `create_using` tells `nx` that it should create a `DiGraph` (directed graph), and `nodetype` tells `nx` to convert each node from `str` to `int`.

The tricky part is the syntax for the edge data, in this case the weights. It is not a tuple ("weight", float) -- it is a sequence of tuples, one tuple for each piece of edge data. There's only one piece of edge data, so it is a sequence of one tuple.

In [11]:
G = nx.read_edgelist(fname, create_using=nx.Graph, nodetype=int, data=[("weight", float)])

Some graph properties:

In [12]:
G.order()

14

In [13]:
G.size()

56

Confirm that the graph is connected, ie no isolated villages:

In [14]:
nx.is_connected(G)

True

*Kruskal's Minimum Spanning Tree* algorithm finds an MST, that is a tree with the same nodes as the original, and a subset of the edges, such that in the tree all nodes are connected, and the sum of edge-weights in the tree is as small as possible. This solves the problem of building the electricity network because the cost of constructing an edge is proportional to the edge distance.

In [21]:
mst = nx.minimum_spanning_tree(G)

In [24]:
# how many edges? a tree always has n-1
mst.size()

13

In [25]:
# What is the total cost of constructing the network?
sum(mst[e0][e1]["weight"] for e0, e1 in mst.edges())

4.423083999999999

*Dijkstra's algorithm* is a famous algorithm for finding the shortest path from a node to all other nodes. Here "shortest path" takes account of edge weights. In our case, the edge weight is a measure of the distance. We have to say which node we want to start from (0) and which edge property to use ("weight").

In [26]:
node_dists, paths = nx.single_source_dijkstra(G, 0, 
                                              weight="weight")

Now we just find the most distant village:

In [34]:
farthest = max(node_dists, key=lambda n: node_dists[n])

In [35]:
farthest

12

In [36]:
node_dists[farthest]

1.0830950000000001


By the way, the Stanford Large Network Dataset collection 
https://snap.stanford.edu/data/ has lots of interesting graphs for further investigations.