# Objective: explore measures of centrality on two networks, a friendship network in Part 1, and a blog network in Part 2

network `G1`: a network of friendships at a university department. Each node corresponds to a person, and an edge indicates friendship. 

*The network has been loaded as networkx graph object `G1`.*

In [1]:
import networkx as nx

G1 = nx.read_gml('friendships.gml')

In [2]:
#cat friendships.gml

### Step 1

Find the degree centrality, closeness centrality, and normalized betweeness centrality (excluding endpoints) of node 100.

*This function returns a tuple of floats `(degree_centrality, closeness_centrality, betweenness_centrality)`.*

In [3]:
def step_one():  
    degree_centrality = nx.degree_centrality(G1)[100]
    clsnss_centrality = nx.closeness_centrality(G1)[100]
    btwn_centrality = nx.betweenness_centrality(G1, endpoints = False,
                                               normalized = True)[100]
    answer = (degree_centrality, clsnss_centrality, btwn_centrality)
    
    return answer

In [4]:
step_one()

(0.0026501766784452294, 0.2654784240150094, 7.142902633244772e-05)

<br>
#### For Steps 2, 3, and 4, assume that you do not know anything about the structure of the network, except for the all the centrality values of the nodes. That is, use centrality measures to rank the nodes when finding the most appropriate candidate.
<br>

### Step 2

Suppose you are employed by an online shopping website and are tasked with selecting one user in network G1 to send an online shopping voucher to.

The user who receives the voucher will send it to their friends in the network.

The voucher should reach as many nodes as possible. 

The voucher can be forwarded to multiple users at the same time, but the travel distance is limited to one step. eg:if the voucher travels more than one step in this network, it is no longer valid. 

I use network centrality to select the best candidate for the voucher. 

*This function returns an integer, the name of the node.*

In [5]:
def step_two():
    degree = nx.degree_centrality(G1)
    answer = max(degree.keys(), key=lambda x:degree[x])
    return answer

In [6]:
step_two()

105

### Step 3

The limit of the voucher’s travel distance has been removed. 

Because the network is connected, regardless of who is picked, every node in the network will eventually receive the voucher. 

However, assume we want the voucher reaches the nodes in the lowest average number of hops.

I determine the best candidate in the network under these conditions.

*This function returns an integer, the name of the node.*

In [7]:
def step_three():
    #use closeness_centrality to determine the nodes that are central in that 
    #they are least far from other nodes
    closeness = nx.closeness_centrality(G1)
    answer = max(closeness.keys(), key = lambda x: closeness[x])
    return answer

In [8]:
step_three()

23

### Step 4

Assume the restriction on the voucher’s travel distance is still removed.

A competitor, though, has developed a strategy to remove a person from the network in order to disrupt the distribution of your company’s voucher.

Your competitor is specifically targeting people who are often bridges of information flow between other pairs of people. 

I identify the single riskiest person to be removed under a competitor’s strategy.

*This function returns an integer, the name of the node.*

In [9]:
def step_four(): 
    betweeness = nx.betweenness_centrality(G1)
    answer = max(betweeness.keys(), key = lambda x:betweeness[x])
    return answer

In [10]:
step_four()

333

## Part 2

`G2` is a directed network of political blogs, where nodes correspond to a blog and edges correspond to links between blogs. PageRank and HITS is used in Steps 5-9.

In [11]:
G2 = nx.read_gml('blogs.gml')

### Step 5

I apply the Scaled Page Rank Algorithm to this network. 

I find the Page Rank of node 'realclearpolitics.com' with damping value 0.85.

*This function returns a float.*

In [12]:
def step_five():
    pr = nx.pagerank(G2, alpha = 0.85)
    answer = pr['realclearpolitics.com']
    return answer

In [13]:
step_five()

0.004636694781649094

### Step 6

I apply the Scaled Page Rank Algorithm to this network with damping value 0.85. 

I find the 5 nodes with highest Page Rank. 

*This function returns a list of the top 5 blogs in desending order of Page Rank.*

In [14]:
def step_six():   
    pr = nx.pagerank(G2, alpha = 0.85)
    answer = sorted(pr.keys(), key = lambda x:pr[x], reverse = True)
    answer = answer[0:5]
    return answer

In [15]:
step_six()

['dailykos.com',
 'atrios.blogspot.com',
 'instapundit.com',
 'blogsforbush.com',
 'talkingpointsmemo.com']

### Step 7

I apply the HITS Algorithm to the network to find the hub and authority scores of node 'realclearpolitics.com'. 

*Your result returns a tuple of floats `(hub_score, authority_score)`.*

In [16]:
def step_seven():
    #use the hits algorithim 
    hits = nx.hits(G2)
    node = 'realclearpolitics.com'
    #first of the hits pairs is the hub score
    #second of the hits pair is the authority score
    answer = (hits[0][node], hits[1][node])
    return answer

In [17]:
step_seven()

(0.0003243556140916672, 0.003918957645699851)

### Step 8 

I apply the HITS Algorithm to this network to find the 5 nodes with highest hub scores.

*This function returns a list of the top 5 blogs in desending order of hub scores.*

In [18]:
def step_eight():
    hits = nx.hits(G2)
    hubs = hits[0]
    answer = sorted(hubs.keys(), key = lambda x:hubs[x], reverse = True)
    answer = answer[0:5]
    return answer

In [19]:
step_eight()

['politicalstrategy.org',
 'madkane.com/notable.html',
 'liberaloasis.com',
 'stagefour.typepad.com/commonprejudice',
 'bodyandsoul.typepad.com']

### Step 9 

I Apply the HITS Algorithm to this network to find the 5 nodes with highest authority scores.

*This function returns a list of the top 5 blogs in desending order of authority scores.*

In [20]:
def step_nine():  
    hits = nx.hits(G2)
    authorities = hits[1]
    answer = sorted(authorities.keys(), key = lambda x:authorities[x], reverse = True)
    answer = answer[0:5]
    return answer

In [21]:
step_nine()

['dailykos.com',
 'talkingpointsmemo.com',
 'atrios.blogspot.com',
 'washingtonmonthly.com',
 'talkleft.com']