TODOs:
    - put harmonize node  names for visualizations
    - solve issue with plotting to pdf

# P01 - 12: Visualising networks with `pathpy`

**April 23 2020**  
*Ingo Scholtes*

A key feature of `pathpy` is its support for custiumizable interactive visualisations that can be embedded in jupyter notebooks or stored as stand-alone files. In the following, we show this functionality in some toy exammples before moving to real data sets in the next unit. We first import `pathpy` as usual. 

In [1]:
import pathpy as pp
import sqlite3

## Interactive network visualisation in `jupyter`

We first create a simple toy example by adding two edges between three nodes.

In [2]:
n = pp.Network(directed=False)
n.add_edges(('a', 'b'), ('b', 'c'))
print(n)

Uid:			0x1b122f209a0
Type:			Network
Directed:		False
Multi-Edges:		False
Number of nodes:	3
Number of edges:	2


Calling the `print` function on a network instance will generate a string representation that can be printed on the console. The simplest way to graphically visualise the network in a jupyter noteboo is to simply type the variable name of the network:

In [3]:
n

<pathpy.core.network.Network object at 0x000001B122F209A0>

This will create an interactive HTML visualisation of the network, where we can zoom, pan, and drag nodes. The same visualisation is generated if we explicitly call the function `plot` on the network instance. Try to zoom and pan the network (by holding the shift key and using the mouse/mouse wheel). Try what happens if you drag a node and release the mouse button. Try to search for a node with a specific uid or name:

In [4]:
n.plot()

Using the two top-left buttons in the visualisatin you can export the current view of the network as a SVG vector graphics or a PNG pixel graphics file. By default a default style is applied to the network but pathpy allows to fully style the network based on (i) node and edge attributes that are automatically considered in the visualisation, or (ii) a custom style dictionary that can be passed to the plot function. If we want to change the color of nodes we can simply assign a `color` attribute to the nodes:

In [5]:
n.nodes['a']['color'] = 'red'
n.nodes['b']['color'] = 'green'
n.nodes['c']['color'] = 'blue'
n.plot()

We can additionally change the size of the nodes as follows:

In [6]:
n.nodes['a']['size'] = 20
n.nodes['b']['size'] = 10
n.nodes['c']['size'] = 20
n.plot()

An alternative method to style the network is by means of a key-value arguments with node and edge properties that are passed to the plot function. We can either assign the same property to all nodes, or we can assign different properties to different nodes based on a dictionary:

In [7]:
n.plot(node_color='red', node_size=10)

In [8]:
n.plot(node_color={'a': 'green', 'b': 'blue', 'c': 'red'}, node_size={'a': 20, 'b': 10, 'c': 20})

A better way to manage plot styles is to store all styles in a dictionary, whose entries are then passed to the plot function using the kwargs operator **. A major advantage of this is that we can store the plot style and apply the same style to multiple networks or to visualisations in multiple formats.

In [9]:
plot_style = {
    'node_color': {'a': 'orange', 'b': 'blue', 'c': 'red'}, 
    'node_size': {'a': 30, 'b': 10, 'c': 20}
}
n.plot(**plot_style)

In any case, the plot parameters passed explicitly to the plot function will take preference over any node or edge properties. To conclude this section, let us plot a few networks that we read from our database file.

In [10]:
con = sqlite3.connect('networks.db')
n = pp.io.sql.read_network(con=con, sql='SELECT source, target FROM lotr', directed=True, multiedges=False)
# n.plot(node_size=5)











































































In [11]:
n = pp.io.sql.read_network(filename='networks.db', table='gentoo', directed=True)
# n.plot(node_size=5)

In [12]:
n = pp.io.sql.read_network(filename='networks.db', table='highschool', directed=False, multiedges=True)
# n.plot(node_size=5)

## Network Layouts

An important question in the drawing of networks is where nodes are placed. The positioning of nodes determines whether it is easy to spot patterns in the topology of the network. As a general rule, good network visualisations should have a small number of crossing edges and nodes should be placed away from each other such that we can easily distinguish them. If we make no other effort, pathpy will automatically use a simple, interactive force-directed layout algorithm. It is based on the idea that nodes connected by an edge are moved towards each other by an attractive force, while an additional repulsive force between all node pairs makes sure that nodes are not drawn on top pf each other. The simulation of those forces, and the computation of a steady-state of the resulting many-particle system is what determines the node positions in the default pathpy visualisation. Also, this is the basis upon which nodes move if you disturb the visualisation by dragging around nodes.

While the default layout algorithm makes it simple to visualise networks, it has the disadvantage that the layout is actually calculated in JavaScript. This means that the positions of nodes are actually not stored in python, which makes it impossible to influence (or store) node positions from python. To solve this issue, `pathpy` provides a `layout` module that can be used to precompute node positions based on different layout algorithms. Moreover, it allows to manually optimise the parameters of those algorithms to generate a nice visualisation.

The styling of node positions via a layout style follows the same idea like the styling of plots via a plot style. We can store the parameters in a dictionary and pass them to the `layout` function using the ** operator. To compute ode positions based on a certain number of iterations of the Fruchterman-Reingold algorithm with a specific force value we can write:

In [14]:
layout_style = {}
layout_style["node_size"] = 2
layout_style['layout'] = 'Fruchterman-Reingold'
layout_style['force'] = 0.2
layout_style['iterations'] = 500
layout = pp.layout(n, **layout_style)
print(layout)

{'642': array([ 1.40912366, -0.88249697]), '480': array([ 6.30143746, -1.45778582]), '275': array([-0.09974692,  2.74767893]), '211': array([ 5.88205966, -1.89989751]), '265': array([-0.73041314,  1.33835093]), '79': array([ 5.58247468, -2.27824374]), '454': array([ 1.31262833, -1.41217893]), '151': array([ 0.65419406, -1.04147613]), '1485': array([-0.33488405, -0.74002764]), '478': array([-1.54800491,  2.00013664]), '502': array([ 0.8980091 , -0.57768055]), '165': array([-0.89031065,  2.48456898]), '122': array([0.07961542, 2.39238955]), '65': array([-1.46438203,  1.93793517]), '400': array([-0.53128721,  1.80295607]), '601': array([ 5.73545825, -1.97600874]), '255': array([0.15639601, 2.88622071]), '845': array([ 6.04207019, -1.9116241 ]), '1423': array([-0.09256137, -0.47259458]), '89': array([ 6.16240163, -2.13235912]), '245': array([ 0.68078586, -0.49720725]), '388': array([ 5.44310657, -1.908381  ]), '945': array([-0.61185564,  1.78394653]), '1332': array([0.10409731, 0.18260456]

This computes a dictionary of node positions, where the node uids are the keys and two-dimensional coordinates are the values. We can now pass this specific layout to the plot function of the network. This will disable the interactive layout in JavaScript, fixing the node positions to the precalculated layout:

In [15]:
# pp.plot(n, **plot_style, layout=layout)

## Plotting spatially embedded networks

Layout algorithms are important to visualise networks where nodes are not naturally embedded in a space. However, for a number of complex networks we can naturally assign coordinates to nodes. Examples include infrastructure networks like road, train, or flight networks where we have information on geographic coordinates of road junctions, train stations, or airports. We do not need a layout algorithm to calculate node positions, we can use node coordinates instead, which we can assign to the node attributes:

In [16]:
n = pp.Network() # why do I get the node properties of the older n?

n.add_node("a")
n.add_node("b")
n.add_node("c")

n.add_edge("a", "b")
n.add_edge("b", "c")

n.nodes['a']['coordinates'] = [0,0]
n.nodes['b']['coordinates'] = [0,1]
n.nodes['c']['coordinates'] = [1,0]
n.plot(**plot_style)

Like for other node attributes, specifying a layout for the network will override the node properties, i.e. in this case the node coordinates will be ignored:

## Plotting networks to PDF

While it is convenient to interactively plot networks in a jupyter network, we often want to generate stand-alone visualisations that we can either share or embed in scientific publications. Apart from the possibility to directly export PNG and SVG visualisations from the interactive HTML widget, the plot function can be used to generate a stand-alone HTML visualisation of the network that can be opened in any browser and shared on the Web.

In [17]:
n.plot( filename='network.html', **plot_style) 

If you want a vector graphics figure that can be embedded in a scientific paper, you can use the plot function to export network visualisations as PDF:

In [20]:
n.plot(**plot_style,  filename='network.pdf') #layout=layout,

[09-02 13:33:44: ERROR] No latexmk compiler found


AttributeError: 

An interesting feature of pathpy is that it builds on the package `tikz-network`, a powerful framework to draw graphs in LaTeX. If we want to generate a visualisation in which we retain the full power to style the network in LaTeX, we can export the network as a tex file that can be build with a LaTeX compiler:

In [None]:
n.plot(**plot_style, layout=layout, filename='network.tex')