<h1> A social network of interactions</h1><br>
In this notebook we will show you some of the common commands you can use when you study and make network graphs using Python.We will use a simple example with a bunch of kids who have been interacting with eachother during a school day. Some of them got sick with a virus infection later same day. We will study the network to see if it can tell us something about the transmission of the virus among the kids. 

In [None]:
import networkx as nx
from datetime import datetime
import matplotlib.pyplot as plt
import numpy as np
import warnings
#from custom import load_data as cf
import csv
#from operator import itemgetter
#import community 
import pandas as pd
#from custom import ecdf
from statsmodels.distributions.empirical_distribution import ECDF

warnings.filterwarnings('ignore')

%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## Loading data¶

- Load the two csv files: 
        Kids_nodes.csv (contains names and information about whether the kid got sick or not) 
        Kids_edges.cvs (contains information about who played with whom)
        (If you would like to see the content of the two files use pd.read_csv("kids_nodes.csv") and pd.read_csv("kids_edges.csv"))

In [None]:
# Open the file with nodes
with open('./data/kids_nodes.csv', 'r') as nodecsv:                 
    nodereader = csv.reader(nodecsv) 
    nodes = [n for n in nodereader][1:]       
    
# Get a list of only the node names                                       
node_names = [n[0] for n in nodes]   

In [None]:
# Open the file with edges
with open('./data/kids_edges.csv', 'r') as edgecsv:
    edgereader = csv.reader(edgecsv) 
    edges = [tuple(e) for e in edgereader][1:] 
 

In [None]:
# create a new graph
G = nx.Graph()

# add nodes and edges to that graph
G.add_nodes_from(node_names)
G.add_edges_from(edges)

##  Nodes and edges
Let us first look at the nodes and edges in the network that we are going to create. 

We can start by examining out how many nodes the network contains. We do so by finding the length of the list containing the nodes of the networks.
At the same time we would also like to see the names of the nodes. The list may though be very long so we can limit the number of nodes we would like to see by using [0:5] which will only return the first five nodes.

In [None]:
# The number of nodes in the network
print(len(G.nodes()))

# The first five nodes in the network
print(list(G.nodes())[0:5])

<div class='alert alert-warning'>
<h4>Exercise 1.</h4>  Now, it turned out that this list was not very long. Edit the code to get the name of the rest of the nodes (excl. the names of the first five nodes)

In [None]:
# Ex1


In [None]:
# %load solutions/ex1_1.py

Now let us look at the same for the edges. Firstly, the links in this network is represented in the form of edge list (a list of each connected pair). 

The number of edges in the network is also represented by the lenght of the list of edges. So we can simply reuse the command for the length of the list of nodes and exchange the word nodes with the word edges. The same can be done for the list-commands.

In [None]:
G.edges()

<div class='alert alert-warning'>
<h4>Exercise 2.</h4> How many edges are there in G?

In [None]:
# Ex2


In [None]:
# %load solutions/ex1_2.py

The `print` statement also let's us report the data in a more readable fassion, for instance:

In [None]:
print("number of nodes: ",  G.number_of_nodes())

<div class='alert alert-warning'>
<h4>Exercise 3a.</h4> Are there any students named Lea in G? Remember that the names in the graph are in lowercase.
  </div> 
  
*Tip: write `G.` and press Tab to pop up a window of available attributes and methods. Maybe what you are looking for is in there?*

In [None]:
# a)


In [None]:
# %load solutions/ex1_3a.py

<div class='alert alert-warning'>
<h4>Exercise 3b.</h4> Did Mette and Anna have any contact at the day of the study?

In [None]:
# b)


In [None]:
# %load solutions/ex1_3b.py

##  Drawing the network
Now we know a bit about the size and nature of the network. Let's try to draw the network - without and with names on the nodes.

We do so by using the command nx.draw from the networkx package

In [None]:
# Draw network
nx.draw(G)

The graph/network is drawn without labels on the nodes by default. You can change that by using the command nx.draw(G, with_labels=True)

In [None]:
# Draw network with labels
nx.draw(G, with_labels=True, node_color='#d2eaf7')

The two networks displayed probably look different because there is some randomness in the projection algorithm. However, keep in mind that the edges are the exact same, so they are **topologically** identical - an important concept.

Below you can see a code that can help you substract a sub-graph showing only the play mates of a specific kid. In the example below you will see who Anna has been playing with that day (you do not need to understand the entire code).

In [None]:
# create function for creating a subgraph
def extract_subgraph(G, node):
    new_G = nx.Graph()                   # new graph
    for neighbor in G.neighbors(node):   # loop through neighbors of selected node
        new_G.add_edge(node, neighbor)   # add edges 
    return new_G

# make new subgraph and draw it with labels
newG = extract_subgraph(G, 'anna')
nx.draw(newG, with_labels=True)

<div class='alert alert-warning'>
<h4>Exercise 4a.</h4> Use the above function to display all the friends of Mads.

In [None]:
# Ex4 a)


In [None]:
#%load solutions/ex1_4a.py

<div class='alert alert-warning'>
<h4> Exercise 4b.</h4> How many kids did Mads play with that day?

In [None]:
# 4b)


In [None]:
# %load solutions/ex1_4b.py

## Extracting information about neighbors (centrality), degree of centrality and ranking based on centrality. 
We just saw how we could get subgraphs containing the playmates of a specific kid - in graph theory these will be the neighbors of a specific node. We can also make commands that will give us the information as numbers and lists. 

If we would like to know the neighbors (here playmates) of e.g. the first node (here Mads), we can use the G.neighbors(node) function.

In [None]:
# Neighbors of the node with the name "mads"
list(G.neighbors('mads'))

The number of neighbors - i.e. the number of other nodes, that one node is connected to - is a measure of the nodes centrality. We could count them like we did above - or we could use the len function to find out the length of the list above. Let's use the `len` function.

In [None]:
# The number of neighbors of the node "mads"? (i.e. what is the centrality of "mads")
len(list(G.neighbors('mads')))

Now pick any kid and calculate its centrality and see who the kid played with


In [None]:
# List of neighbors
list(G.neighbors('jesper'))

In [None]:
# Calculation of centrality
len(list(G.neighbors('jesper')))

We can now rank the nodes based on how many nodes they have (the *degree*):

In [None]:
degs = G.degree
print(degs)

<div class='alert alert-warning'>
<h4>Exercise 5a. </h4> Present nodes sorted by the number of degrees (descending order). We recommend using the built-in `sorted` function for this.
    
Problem: it sorts by default by the first element in the items (i.e. names alphabetically). Can you change that?

In [None]:
# this will not work:
sorted(degs)

To help you get started, inspect the documentation: 

In [None]:
sorted??

In [None]:
# 5a)


In [None]:
# %load solutions/ex1_5a.py

It is by default sorted from highest to lowest.

<div class='alert alert-warning'>
<h4>Exercise 5b. </h4> Who has the second highest number of friends?
</div>

**Hint:** The degrees are stored in a custom made data structure by networkx:

In [None]:
# 5b)

In [None]:
# %load solutions/ex1_5b.py

An equivalent function yields the degree normalized from 0-1

In [None]:
degs = nx.algorithms.centrality.degree_centrality(G) #notice some floating point errors...

In [None]:
type(degs)

This returns a `dict`, from which we can extract only the numerical values:

In [None]:
vals = list(degs.values())
vals

And now we can plot them. Here are a few different ways to represent them.

In [None]:
plt.boxplot(vals)
plt.title('Degree Centralities')
plt.show()

In [None]:
plt.hist(vals)
plt.title('Degree Centralities')
plt.show()

In [None]:
# let's make a empirical distribution function (eCDF)
ecdf = ECDF(vals)
x,y = ecdf.x, ecdf.y

In [None]:
plt.scatter(x,y)
plt.title('Degree Centralities')

<div class='alert alert-warning'>
<h4>Exercise 6.</h4> How many of the kids only had a single playmate at the day of the study?

In [None]:
# Ex6


In [None]:
# %load solutions/ex1_6.py

## Sick or not sick?
Now, in the csv file we also have information about whether or not the kids got sick or not. Let us use this information and see if it can tell us something about the contamination.

In [None]:
df = pd.read_csv('data/kids_nodes.csv')
df.head()

<div class='alert alert-warning'>
<h4>Exercise 7. </h4> Display only those who were sick

In [None]:
# Ex7


In [None]:
# %load solutions/ex1_7.py

Let us highlight the sick kids and redraw the network. Do you see any pattern?

In [None]:
#don't worry about the code
color_map = []
for node in G:
    if node in sick.source.values:
        color_map.append('magenta')
    else: 
        color_map.append('green')      
nx.draw(G, node_color=color_map, with_labels=True)
plt.show()

By visual inspection we see that Mads is the common denominator for the sick kids, pointing to him as a source of contamination. Below we plot only the subgraph consisting of the sick kids.

In [None]:
sub_G = G.subgraph(sick.source)
nx.draw(sub_G, with_labels=True, node_color='magenta')

We should also warn the children who were in contact with the contaminated kids. We use the `set` data structure: an unordered list without duplicates.

In [None]:
kids_to_warn = set()

In [None]:
for sick_kid in sick.source:
    print(sick_kid)
    nbs = list(G.neighbors(sick_kid)) #list of kids the contaminated ones were in contact with
    for nb in nbs:
        kids_to_warn.add(nb)


<div class='alert alert-warning'>
<h4>Exercise 8. </h4>  Finally drop those we already know were sick from the set.
Tip: write `kids_to_warn.` and press Tab to see available methods

In [None]:
# Ex8


In [None]:
# %load solutions/ex1_8.py

In [None]:
print("The kids to warn:")
print(kids_to_warn)

## Adding nodes and edges
During the data collection some of the kids forgot that they also played with Hans that day! Here we will show you how to add Hans to the network. You may try to add even more kids to the network.

In [None]:
G.add_node('hans')

# Add multiple edges
G.add_edges_from([
    ('hans', 'lars'),
    ('hans', 'line'),
])

nx.draw(G, with_labels=True)