# P02 - 01: Shortest paths, diameter, and average path lengths

*April 30 2020*

In this first unit we will explore the graph-theoretic foundations that were introduced in theory lecture L02. We specifically explore basic shortest path algorithms and calculate the diameter and average shortest path lengths of empirical networks. Our exploration is based on the latest version of `pathpy3`, which you can install as explained in the previous practice lecture. Please remember to update tour (editable) installation by running `git pull` in your local copy of the git repository.

In [28]:
import numpy as np
import pathpy as pp
import scipy as sp

import sqlite3

In [29]:
n_undirected = pp.Network(directed=False)
n_undirected.add_edge('a', 'b')
n_undirected.add_edge('b', 'c')
n_undirected.add_edge('a', 'c')
n_undirected.add_edge('d', 'f')
n_undirected.add_edge('d', 'g')
n_undirected.add_edge('d', 'e')
n_undirected.add_edge('e', 'f')
n_undirected.add_edge('f', 'g')
n_undirected.plot()

### Strongly connected components in directed networks

In directed networks, we distinguish between strongly and weakly connected networks. The following network is weakly but not strongly connected, because frmom the nodes `a` and `b` we can only reach `c` and `d` in one direction, but not in the opposite direction.

In [30]:
n_directed = pp.Network(directed=True)
n_directed.add_edge('a', 'b')
n_directed.add_edge('b', 'a')
n_directed.add_edge('a', 'c')
n_directed.add_edge('b', 'c')
n_directed.add_edge('c', 'd')
n_directed.add_edge('d', 'c')
n_directed.plot()

## Computing connected components in `pathpy`

The `find_connected_components` function in `pathpy` returns a dictionary of connected components:

(Based on Tarjans algorithm?)

In [31]:
pp.algorithms.components.find_connected_components(n_undirected)

{0: {'a', 'b', 'c'}, 1: {'d', 'e', 'f', 'g'}}

In [32]:
pp.algorithms.components.find_connected_components(n_directed)

{0: {'c', 'd'}, 1: {'a', 'b'}}

We can use the function `largest_connected_component` to extract the largest connected component and return it as a new network object:

In [33]:
lcc = pp.algorithms.components.largest_connected_component(n_undirected)
lcc.plot()

To compute the size of the largest connected component in a network we can use a special function:

In [34]:
pp.algorithms.components.largest_component_size(n_undirected)

4

In [35]:
# isn't that a bit ambiguous? Both ab and cd have the same size
lcc = pp.algorithms.components.largest_connected_component(n_directed)
lcc.plot()

## Connected component size in empirical networks

We finally apply connected component analysis to empirical networks. We use the same data sets that we used in the previous unit, and study whether (i) they are connected, (ii) how large the largest connected component is, and (iii) what are the shortest path lengths and the diameter in the largest connected component:

In [36]:
n_highschool = pp.io.sql.read_network('networks.db', sql='SELECT source, target FROM "highschool"', directed=False)
n_physicians = pp.io.sql.read_network('networks.db', sql='SELECT source, target FROM "physicians"', directed=False)
n_gentoo = pp.io.sql.read_network('networks.db', sql='SELECT source, target FROM "gentoo"', directed=True)
n_lotr = pp.io.sql.read_network('networks.db', sql='SELECT source, target FROM "lotr"', directed=False)

In [37]:
n_lotr = pp.io.sql.read_network('networks.db', sql='SELECT source, target FROM "lotr"', directed=False)

In [38]:
print(n_lotr.summary())

Uid:			0x1a913f751c0
Type:			Network
Directed:		False
Multi-Edges:		False
Number of nodes:	139
Number of edges:	634


In [39]:
n_gentoo._properties

defaultdict(None,
            {'edges': set(),
             'successors': defaultdict(set, {}),
             'predecessors': defaultdict(set, {}),
             'outgoing': defaultdict(set, {}),
             'incoming': defaultdict(set, {}),
             'neighbors': defaultdict(set, {}),
             'incident_edges': defaultdict(set, {}),
             'indegrees': defaultdict(float, {}),
             'outdegrees': defaultdict(float, {}),
             'degrees': defaultdict(float, {})})

In [40]:
pp.algorithms.components.largest_connected_component(n_gentoo).number_of_nodes()

1

In [11]:
n_highschool.number_of_edges()

348

In [12]:
print(pp.algorithms.components.largest_component_size(n_highschool)/n_highschool.number_of_nodes())

0.008333333333333333


In [13]:
print(pp.algorithms.components.largest_component_size(n_physicians)/n_physicians.number_of_nodes())

0.004149377593360996


In [14]:
pp.algorithms.components.largest_component_size(n_physicians)

1

In [15]:
# n_physicians

In [22]:
pp.algorithms.components.find_connected_components(n_gentoo)

{0: {'aries.huijzer'},
 1: {'betelgeuse'},
 2: {'x86'},
 3: {'flameeyes'},
 4: {'8an'},
 5: {'alpeterson'},
 6: {'stepp'},
 7: {'qeldroma'},
 8: {'gazman'},
 9: {'desktop-wm'},
 10: {'tcort'},
 11: {'kerberos'},
 12: {'accessibility'},
 13: {'gad.kadosh'},
 14: {'andre'},
 15: {'strowi'},
 16: {'dragonheart'},
 17: {'smallone'},
 18: {'gcc-porting'},
 19: {'squinky86'},
 20: {'ed'},
 21: {'jforman'},
 22: {'rodney.brown'},
 23: {'phasma'},
 24: {'bugs.gentoo.org'},
 25: {'paapaa125'},
 26: {'goric'},
 27: {'bugzilla-gentoo'},
 28: {'evan'},
 29: {'wsheets'},
 30: {'andrei.ivanov'},
 31: {'exg'},
 32: {'mrness'},
 33: {'carlo'},
 34: {'net-fs'},
 35: {'tetromino'},
 36: {'hardened'},
 37: {'znmeb'},
 38: {'welp'},
 39: {'hoadley'},
 40: {'ps.m'},
 41: {'maciej.blizinski'},
 42: {'zeksers'},
 43: {'crypto'},
 44: {'jaervosz'},
 45: {'lazy_bum'},
 46: {'vanquirius'},
 47: {'fonts'},
 48: {'yawgmoth7'},
 49: {'anarchy'},
 50: {'joshuabaergen'},
 51: {'joelinux'},
 52: {'vyzivus'},
 53: {'M

In [17]:
lcc_physicians = pp.algorithms.components.largest_connected_component(n_physicians)
lcc_gentoo = pp.algorithms.components.largest_connected_component(n_gentoo)
lcc_lotr = pp.algorithms.components.largest_connected_component(n_lotr)

In [18]:
pp.algorithms.components.largest_connected_component(n_lotr)

<pathpy.core.network.Network object at 0x000001A91174D5E0>

In [None]:
print("Diameter of physicians is ", pp.algorithms.shortest_paths.diameter(lcc_physicians))
print("Diameter of gentoo is ", pp.algorithms.shortest_paths.diameter(lcc_gentoo))
print("Diameter of lotr is ", pp.algorithms.shortest_paths.diameter(lcc_lotr))

print("Avg. path length of physicians is ", pp.algorithms.shortest_paths.avg_path_length(lcc_physicians))
print("Avg. path length of gentoo is ", pp.algorithms.shortest_paths.avg_path_length(lcc_gentoo))
print("Avg. path length of lotr is ", pp.algorithms.shortest_paths.avg_path_length(lcc_lotr))