# Lecture 10 - Network Centrality

In this notebook we will learn how to calculate network centralities using the `networx` package.  For a complete list of network centralities that can be calculated you can look here: https://networkx.org/documentation/stable/reference/algorithms/centrality.html

In addition, we will create interactive visualizations of the networks and make the centrality information visible in the node size and color.  

Our example network will be a Twitter follower network.

Below is the overview of this notebook.


<ol type = 1>
    <li> Data processing</li>
        <ol type = a>
            <li> Load follower network</li>
        </ol>
    <li> Calculate Network Centralities</li>
        <ol type = a>
            <li> Out-/in-degree centrality</li>
            <li> Closeness centrality</li>
            <li> Betweenness centrality</li>
            <li> Eigenvector centrality</li>
        </ol>

   <li>Plot centralities</li>

   <li> Network Visualization </li>
        <ol type = a>
            <li> Draw static visualization with nodes sized by centrality</li>
            <li> Interactive visualization  with nodes sized by centrality</li>
       </ol>
    <li> Retweet Network Example </li>
</ol>

This notebook can be opened in Colab 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zlisto/social_media_analytics/blob/main/Lecture10_NetworkCentrality.ipynb)

Before starting, select "Runtime->Factory reset runtime" to start with your directories and environment in the base state.

If you want to save changes to the notebook, select "File->Save a copy in Drive" from the top menu in Colab.  This will save the notebook in your Google Drive.




# Clones, installs, and imports


## Clone GitHub Repository
This will clone the repository to your machine.  This includes the code and data files.  Then change into the directory of the repository.

In [None]:
!git clone https://github.com/zlisto/social_media_analytics

import os
os.chdir("social_media_analytics")

## Install Requirements 


In [None]:
!pip install -r requirements.txt


## Import Packages

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import networkx as nx

import holoviews as hv
from holoviews import opts
from bokeh.plotting  import show
from bokeh.models import HoverTool
hv.extension('bokeh')

defaults = dict(width=800, height=800)
hv.opts.defaults(opts.EdgePaths(**defaults), opts.Graph(**defaults), opts.Nodes(**defaults))

# Load Follower Network

The follower network is saved in a pickle file as a networkx object.  We load it using the `read_gpickle` function.  We can check how many nodes and edges in the network using the `number_of_nodes` and `number_of_edges` functions, respectively.



In [None]:
#filename of follower network
fname_following = "data/friends_network_JoeBiden.pickle"

G = nx.read_gpickle(fname_following)


nv = G.number_of_nodes()
ne = G.number_of_edges()
print(f"Network has {nv} nodes and {ne} edges")

# Calculate Centralities

There are a variety of network centrality functions we can use in networkx.  We will use several here.  We need to be careful with closeness and eigenvector centrality, as their definition of an edge is the reverse of our convention for social networks.

1. `Din` = In-degree centrality (networkx divides the in-degree by `nv`-1 so the maximum value =1)

2. `Dout` = Out-degree (networkx divides the out-degree by `nv`-1 so the maximum value =1)

3. `CC` = Closeness centrality (we need to reverse edges of `G` to match networkx convention using the `reverse()` function)

4. `BC` = Betweenness centrality (making the network undirected makes this work better)

5. `EC` = Eigenvector centrality (we need to reverse edges of `G` to match networkx convention  using the `reverse()` function)




In [None]:
Din = nx.in_degree_centrality(G)
Dout = nx.out_degree_centrality(G)
CC = nx.closeness_centrality(G.reverse())  #reverse edges to match networx convention
BC = nx.betweenness_centrality(G.to_undirected())
EC = nx.eigenvector_centrality(G.reverse())  #reverse edges to match networx convention


## Convert centralities into a dataframe

The centralities calculated by networkx are returned as a *dictionary*.  A dictionary stores data as (key,value) pairs.  To access a value for a key **key** from a dictionary **Dic** you write:  `value = Dic[key]`.

We will convert the dictionaries into lists.  We do this using a `for` loop that goes through every single key (which is a screen name) in the dictionaries, gets the corresponding centrality values, and creates a dictionary of the centralties for this screen name.  Then we append this dictionary to a list called `dictionary_list`.  Then we convert this list of dictionaries into a dataframe with the `DataFrame` command.  The resulting centralties dataframe is called `df_centrality`.



In [None]:
#For plotting, we combine all the centrality dictionaries into a dataframe
dictionary_list = []
for screen_name in Din.keys():
    row = {'screen_name':screen_name,
          'in_degree_centrality':Din[screen_name],
          'out_degree_centrality':Dout[screen_name],
          'closeness_centrality':CC[screen_name],
          'betweenness_centrality':BC[screen_name],
          'eigenvector_centrality':EC[screen_name]}
    dictionary_list.append(row)
df_centrality = pd.DataFrame(dictionary_list)

df_centrality.sort_values(by = ['out_degree_centrality'],ascending = False).head()


## Plot top centralities

We will sort the top centralities descending order.  To make one big figure we use the subplot function.

To make our code more efficient, we create a list `Centrality` that contains the name of each column of `df_centrality` that corresponds to a centrality.  Then we run a `for` loop over the centralties in this list, and make a bar plot of the top `kmax` values.  We make subplots for each centrality in one big figure.  There are many plot commands used here to make the figure look nice.  Have a look at them to see what they do.

In [None]:
Centrality_names = df_centrality.columns.tolist()[1:]
kmax = 10  #show top kmax users


fig = plt.figure(figsize = (20,14))

for count,centrality_name in enumerate(Centrality_names):    
    df_plot = df_centrality.sort_values(by=[centrality_name],ascending=False)  #sort dataframe by centrality value
    plt.subplot(3,2,count+1) #make a 2 x 3 subplot, plot in box cnt+1
    
    ax = sns.barplot(data=df_plot[0:kmax], x='screen_name', y=centrality_name)
    ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)
    plt.xticks(fontsize = 14)
    plt.yticks(fontsize = 14)
    plt.ylabel(f"{centrality_name}",fontsize = 18)
    plt.xlabel('Screen name',fontsize = 18)    
    plt.grid()

plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=0.75)

plt.show()

# Network Visualization




## Network Layout
First we calculate the node layout using the `kamada_kawai_layout` method (we will learn about this later).  We also define some properties of the network drawing.


In [None]:
pos = nx.kamada_kawai_layout(G)  #position of each node in the network


node_color = 'red'
edge_color = 'purple'
background_color = 'black'
edge_width = 1

## Node Size Proportional to Centrality

Then we make the node size proportional to different centrality values.  The `Centrality` variable is a list with the values of the chosen centrality from `df_centrality`.  The list `node_size_centrality` has the node sizes.



In [None]:


size_min = 1
size_max = 1000

#define parameters for linear interpolation of node size from out-degree
Centrality = df_centrality['betweenness_centrality'].tolist()


dmax = max(Centrality)
dmin = min(Centrality)
slope = (size_max-size_min)/(dmax-dmin)
intercept = size_min-slope*dmin 

#Go through each node and calculate its size
node_size_centrality = [c*slope+intercept  for c in Centrality ]    
    


## Draw Network

In the `draw` function, set the parameter `with_labels=True`and you can add labels of the nodes to the plot.  Use the parameter `font_color` to choose the font color.

In [None]:
#Draw the network, with labels    
fig = plt.figure(figsize=(8,8))
nx.draw(G, pos, node_color = node_color, width= edge_width,
        edge_color=edge_color,node_size=node_size_centrality,
       with_labels=True,font_color = 'white')
fig.set_facecolor(background_color)
plt.show()

# Interactive Network Visualization

We will visualize the network using the *holoviews* package, which creates an interactive plot of the network.  

## Format Network Data
First we need to
add some node attributes to the networkx object `G` so we can display these values in the plot.  We need to save the values as strings to make them appear in the plot (it's weird, I know).

In [None]:
node_size = 30
pos = nx.kamada_kawai_layout(G)  #position of each node in the network

for cnt,v in enumerate(G.nodes()):
    dout = df_centrality.out_degree_centrality[df_centrality.screen_name==v].values[0]
    din = df_centrality.in_degree_centrality[df_centrality.screen_name==v].values[0]
    cc = df_centrality.closeness_centrality[df_centrality.screen_name==v].values[0]
    bc = df_centrality.betweenness_centrality[df_centrality.screen_name==v].values[0]
    ec = df_centrality.eigenvector_centrality[df_centrality.screen_name==v].values[0]

    node_size = node_size
    G.nodes[v]['out_degree_centrality'] = f"{dout:.2f}"
    G.nodes[v]['in_degree_centrality'] =f"{din:.2f}"
    G.nodes[v]['closeness_centrality'] = f"{cc:.2}"
    G.nodes[v]['betweenness_centrality'] = f"{bc:.2}"
    G.nodes[v]['eigenvector_centrality'] = f"{ec:.2}"
    G.nodes[v]["node_size"] = node_size
    G.nodes[v]['screen_name'] = v






## Create Holoviews interactive visualization

We first define `tooltips` as a list of values and their names we want to display when 
we hover over a node.  Then we define `hover` as a HoverTool that displays
the node information in `tooltips`.  Finally, we define `graph` as the network visualization
object.  There are many self-explanatory fields here we can play with.



In [None]:
tooltips = [
    ('Screen name','@screen_name'),
    ("Node size","@node_size"),
    ("Out-degree","@out_degree_centrality"),
    ("In-degree","@in_degree_centrality"),
    ("Closeness","@closeness_centrality"),
    ("Betweenness","@betweenness_centrality"),
    ("Eigenvector","@eigenvector_centrality"),

    
]
hover = HoverTool(tooltips=tooltips)

graph = hv.Graph.from_networkx(G, pos).opts(height=800, width=800, tools=[hover],node_size = hv.dim("node_size"), 
                                    node_line_color='black', edge_line_color = 'purple',bgcolor='black',
                                    node_hover_fill_color = 'red').relabel('Following network')




## Display Network Visualization

To create the visualization, just type `graph`.


In [None]:
graph

## Visualize network with nodes sized by a centrality

We will do a linear interpolation of the node size based on a centrality measure.  We choose the 
`centrality_name` from our dataframe columns, and the `size_min` and `size_max` of the nodes.

In [None]:
#choose the centrality to use for sizing, and get the min and max values
centrality_name = 'closeness_centrality'

#choose the minimum and maximum size for the nodes
size_min = 10
size_max  = 50

#Find the minimum and maximum values for the centrality
dmax = df_centrality[centrality_name].max()
dmin = df_centrality[centrality_name].min()

#calculate slope and intercept for linear interpolation of node size
slope = (size_max-size_min)/(dmax-dmin)
intercept = size_min-slope*dmin 

In [None]:
pos = nx.kamada_kawai_layout(G)  #position of each node in the network

for v in G.nodes():
    centrality =df_centrality[centrality_name][df_centrality.screen_name==v].values[0]
    
    node_size = centrality*slope+intercept  
    G.nodes[v]["node_size"] = node_size
    G.nodes[v]['screen_name'] = v






In [None]:
title_str = f'Following network, node size = {centrality_name}'
tooltips = [
    ('Screen name','@screen_name'),
    ("Node size","@node_size")  
]
hover = HoverTool(tooltips=tooltips)

graph = hv.Graph.from_networkx(G, pos).opts(height=800, width=800, tools=[hover],
                                            node_size = hv.dim("node_size"), 
                                    node_line_color='black', 
                                            edge_line_color = 'purple',
                                            bgcolor='black',
                                    node_hover_fill_color = 'red').relabel(title_str)



In [None]:
graph

# Retweet Network

Now we will calculate centralities for a retweet network.

## Load Network


In [None]:
#filename of retweet network
fname_rt = "data/nbaallstar_interaction_network.pickle"

Grt = nx.read_gpickle(fname_rt)


nv = Grt.number_of_nodes()
ne = Grt.number_of_edges()
print(f"Retweet network has {nv} nodes and {ne} edges")

## Calculate Centralities

In [None]:
%%time

Din = nx.in_degree_centrality(Grt)
Dout = nx.out_degree_centrality(Grt)
CC = nx.closeness_centrality(Grt.reverse())  #reverse edges to match networx convention
BC = nx.betweenness_centrality(Grt.to_undirected())
EC = nx.eigenvector_centrality(Grt.reverse())  #reverse edges to match networx convention


## Convert Centralities Into Dataframe

In [None]:
#For plotting, we combine all the centrality dictionaries into a dataframe
dictionary_list = []
for user_id in Din.keys():
    screen_name = Grt.nodes[user_id]['username']
    row = {'user_id':user_id,
           'screen_name':screen_name,           
          'in_degree_centrality':Din[user_id],
          'out_degree_centrality':Dout[user_id],
          'closeness_centrality':CC[user_id],
          'betweenness_centrality':BC[user_id],
          'eigenvector_centrality':EC[user_id]}
    dictionary_list.append(row)
df_centrality = pd.DataFrame(dictionary_list)

df_centrality.sort_values(by = ['out_degree_centrality'],ascending = False).head()


## Plot Top Centralities

In [None]:
Centrality_names = df_centrality.columns.tolist()[2:]
kmax = 10  #show top kmax users


fig = plt.figure(figsize = (20,14))

for count,centrality_name in enumerate(Centrality_names):    
    df_plot = df_centrality.sort_values(by=[centrality_name],ascending=False)  #sort dataframe by centrality value
    plt.subplot(3,2,count+1) #make a 2 x 3 subplot, plot in box cnt+1
    
    ax = sns.barplot(data=df_plot[0:kmax], x='screen_name', y=centrality_name)
    ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)
    plt.xticks(fontsize = 14)
    plt.yticks(fontsize = 14)
    plt.ylabel(f"{centrality_name}",fontsize = 18)
    plt.xlabel('Screen name',fontsize = 18)    
    plt.grid()

plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=0.75)

plt.show()

## Draw Network

The retweet network is rather large, so to draw it, we will only keep nodes with at least one neighbor.  The we created a subgraph `G1` that is the original graph, but only with the nodes we want to keep.  We do this using the `subgraph` function.

In [None]:
nv = Grt.number_of_nodes()
nodes_draw = df_centrality.user_id[df_centrality.out_degree_centrality*(nv-1)>=1].tolist()
print(f"Smaller network has {len(nodes_draw)} nodes")

G1 = Grt.subgraph(nodes_draw)
nv1 = G1.number_of_nodes()
ne1 = G1.number_of_edges()
print(f"Retweet sub-network has {nv1} nodes and {ne1} edges")


## Draw Network, Size Nodes by Centrality

We can draw the retweet sub-network with nodes sized by a centrality, as we did before.  

In [None]:
pos = nx.kamada_kawai_layout(G1)  #position of each node in the network




In [None]:
#choose the centrality to use for sizing, and get the min and max values
centrality_name = 'out_degree_centrality'

#choose the minimum and maximum size for the nodes
size_min = 10
size_max  = 50

#Find the minimum and maximum values for the centrality
dmax = df_centrality[centrality_name].max()
dmin = df_centrality[centrality_name].min()

#calculate slope and intercept for linear interpolation of node size
slope = (size_max-size_min)/(dmax-dmin)
intercept = size_min-slope*dmin 

In [None]:
for cnt,v in enumerate(G1.nodes()):
    dout = df_centrality[centrality_name][df_centrality.user_id==v].values[0]
    node_size = dout*slope+intercept  
    G1.nodes[v][centrality_name] = f"{dout:.4f}"
    G1.nodes[v]["node_size"] = node_size
    G1.nodes[v]['screen_name'] = df_centrality.screen_name[df_centrality.user_id==v].values[0]
    


In [None]:
title_str = f"Retweet Network, Node Size = {centrality_name}"
tooltips = [
    ('Screen name','@screen_name'),
    (centrality_name,f"@{centrality_name}")
]
hover = HoverTool(tooltips=tooltips)

graph = hv.Graph.from_networkx(G1, pos).opts(height=800, width=800, 
                                             tools=[hover],
                                             node_size = hv.dim("node_size"), 
                                    node_line_color='black', 
                                             edge_line_color = 'purple',
                                             bgcolor='black',
                                    node_hover_fill_color = 'red').relabel(title_str)


#to show the network, just type graph
graph