---

_You are currently looking at **version 1.2** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-social-network-analysis/resources/yPcBs) course resource._

---

# Assignment 3

In this assignment you will explore measures of centrality on two networks, a friendship network in Part 1, and a blog network in Part 2.

## Part 1

Answer questions 1-4 using the network `G1`, a network of friendships at a university department. Each node corresponds to a person, and an edge indicates friendship. 

*The network has been loaded as networkx graph object `G1`.*

In [None]:
######################
#      Influence Measures and Network Centralizationの課題      #
# Q1. degree centrality, closeness centrality, normalized betweeness centralityを求める
# Q2. Nodesにattributeをセットする
# Q3. Nodesの片側(L)のWeighted Projected Graphを作成する  <- NodesのL、R片側全てを指定しなくても作成は可能
# Q4. 高いリレーションシップスコア（交友関係度）を持つ人達はやはり同じタイプの映画が好きかどうか相関係数を出す。
######################


#####!!REVIEW!!##
# ◆ Degree and Closeness Centrality ◆
# degree centrality の想定したこと :  他のノードとの繋がりが多いほど重要なノード
# closeness centrality の想定したこと : 他のノードとの平均距離が短いほど重要なノード
# betweenness centrality の想定したこと : 他のノードがshortest pathでよく通過するノードほど重要なノード
# 
# degree centrality( of node)の式 : degreeの数 ÷ (ノードの総数 - 1)
#                                    G = nx.convert_labels_to_integers(G, first_label=1)
#                                    degCent = nx.degree_centrality(G)
#                                    print(degCent[34])  => 例: 0.515 (17 / 33)(※全部で34ノード)
# in-degree centrality : Directed Graphのin-degree　　式 = in degreeの数 ÷ (ノードの総数 - 1)
#                                    indegCent = nx.degree_centrality(G)
# out-degree centrality : Directed Graphのout-degree　　式 = out degreeの数 ÷ (ノードの総数 - 1)
#                                    outdegCent = nx.out_degree_centrality(G)
#
# closeness centrality( of node)の式 : (ノードの総数 - 1) ÷ (距離の和)　　※ 距離の和 ==> sum(nx.shortest_path_length(G, 32).values())
#                                    closeCent = nx.closeness_centrality(G)
#                                    print(degCent[32])  => 例: 0.541 (len(G.nodes()) -1) ÷ sum(nx.shortest_path_length(G, 32).values()) で求まる。
#                                    Disconnected Nodes(他に到達できないノード)はどうするか。
#                                    →案1. ノードが到達できるノードのみをネットワークとして考える。
#                                        closeCent = nx.closeness_centrality(G, normalized = False)
#                                    →案2. ノードが到達できるノードのみをネットワークとして考え、(その数 / (N-1))を掛けて全体で値が狂わないようにNormalizeする。
#                                        closeCent = nx.closeness_centrality(G, normalized = False)
#
# ◆ Betweenness Centrality ◆
# normalized betweenness centrality( of node)の式 : sum(各ノード間のshortest_pathでノードを通過する数/ 各ノード間のshortest_pathの数)
#                                                                                         (2点間のshortest_pathは何通りもあるのでこのような式になる)
#                                            Disconnected Nodes(他に到達できないノード)はどうするか。
#                                            →案1. そのノード間の式を(sum関数に)含めない。
# betweenness centrality と normalizationの関係 : ノード数が増えてくると値は必然的に多くなるので公平の為にノードのペアの数で割る
#                                             Unirected Graph=> 1/2 *  (ノードの総数 - 1) * (ノードの総数 - 2)
#                                             Directed Graph=> (ノードの総数 - 1) * (ノードの総数 - 2)
#                                    btwnCent = nx.betweenness_centrality(G,
#                                                               normalized=True, endpoints=False) # centralityのノードを含める時はendpoints=True
#                                    import operator
#                                    sorted(btwnCent.items(), key=operator.itemgetter(1), reverse=True)[:5] # 上位5つのノードが得られる
# betweenness centrality without normalization( of edge)の式 : 
# betweenness centrality と closeness centralityの関係 : 
# betweenness centrality の式 : Node 5 has the highest centrality because all shortest paths from {1, 2, 3, 4} to {6, 7, 8, 9} have to go through node 5. In other words, node 5 is a bridge. Hence node 5 lies on the most shortest paths in the network.
#                                    
#  <上記まとめ>                                 
# Normalization : Divide by number of pairs of nodes
# Approximation : Computing betweenness centrality can be computationally expensive. (2200ノードで500万ペアになるから)
#                              We can approximate computation by taking a subset of nodes.
# 　　　　　　　　　　　　　　　　　btwnCent_approx = nx.betweenness_centrality(G,
#                                                        normalized=True, endpoints=False, k=10) 
# Subsets : We can define subsets of source and target nodes to compute betweenness centrality.(これもコンピュートコストの高騰を抑える為)
# Edge betweenness centrality : We can apply the same framework to find important edges instead of nodes.
#
# ◆ Scaled Page Rank ◆
# in-degree centralityの大きさとPRsの関係 : 
# out-linksが増えるととPRは減る
# in-linksが増えるとPRは増える
# α (Damping Parameter) とPageRankの関係
# step k = 1の時のbasic PR : 
# the sum of basic PR at each step: 
# in-linkの全くないnodeのPRは .. : 
# step kが１つ上がると、basic PRはどうなる? 
#
# ◆ Hubs and Authorities ◆
# Authority = 他からどれだけリンクを受けて権威を当てられているか。 Hub: どれだけリンクを持ちhubの役割を果たしているか
# normalized authority and hub scores ( of node)の式 : 
# HITS algorithm : 
# in-linkの全くないnodeのauthorityとhub scoreはstep kがインクリメントされた時どうなるか : 
# Nodes that have outgoing edges to good hubs are good authorities, and nodes that have incoming edges from good authorities are good hubs?
# The authority and hub score of each node is obtained by computing multiple iterations of HITS algorithm and both scores of most networks are convergent?
#
# ◆ Comparing Centrality Measures特徴 ◆
# Closeness ..
# Betweenness ..
# PageRank ..
# Auth ..
# Hub ..
# 
#
#######
import networkx as nx

G1 = nx.read_gml('friendships.gml')

### Question 1

Find the degree centrality, closeness centrality, and normalized betweeness centrality (excluding endpoints) of node 100.

*This function should return a tuple of floats `(degree_centrality, closeness_centrality, betweenness_centrality)`.*

In [None]:
def answer_one():
        
    # Your Code Here
    
    return # Your Answer Here

<br>
#### For Questions 2, 3, and 4, assume that you do not know anything about the structure of the network, except for the all the centrality values of the nodes. That is, use one of the covered centrality measures to rank the nodes and find the most appropriate candidate.
<br>

### Question 2

Suppose you are employed by an online shopping website and are tasked with selecting one user in network G1 to send an online shopping voucher to. We expect that the user who receives the voucher will send it to their friends in the network.  You want the voucher to reach as many nodes as possible. The voucher can be forwarded to multiple users at the same time, but the travel distance of the voucher is limited to one step, which means if the voucher travels more than one step in this network, it is no longer valid. Apply your knowledge in network centrality to select the best candidate for the voucher. 

*This function should return an integer, the name of the node.*

In [None]:
def answer_two():
        
    # Your Code Here
    
    return # Your Answer Here

### Question 3

Now the limit of the voucher’s travel distance has been removed. Because the network is connected, regardless of who you pick, every node in the network will eventually receive the voucher. However, we now want to ensure that the voucher reaches the nodes in the lowest average number of hops.

How would you change your selection strategy? Write a function to tell us who is the best candidate in the network under this condition.

*This function should return an integer, the name of the node.*

In [None]:
def answer_three():
        
    # Your Code Here
    
    return # Your Answer Here

### Question 4

Assume the restriction on the voucher’s travel distance is still removed, but now a competitor has developed a strategy to remove a person from the network in order to disrupt the distribution of your company’s voucher. Your competitor is specifically targeting people who are often bridges of information flow between other pairs of people. Identify the single riskiest person to be removed under your competitor’s strategy?

*This function should return an integer, the name of the node.*

In [None]:
def answer_four():
        
    # Your Code Here
    
    return # Your Answer Here

## Part 2

`G2` is a directed network of political blogs, where nodes correspond to a blog and edges correspond to links between blogs. Use your knowledge of PageRank and HITS to answer Questions 5-9.

In [None]:
G2 = nx.read_gml('blogs.gml')

### Question 5

Apply the Scaled Page Rank Algorithm to this network. Find the Page Rank of node 'realclearpolitics.com' with damping value 0.85.

*This function should return a float.*

In [None]:
def answer_five():
        
    # Your Code Here
    
    return # Your Answer Here

### Question 6

Apply the Scaled Page Rank Algorithm to this network with damping value 0.85. Find the 5 nodes with highest Page Rank. 

*This function should return a list of the top 5 blogs in desending order of Page Rank.*

In [None]:
def answer_six():
        
    # Your Code Here
    
    return # Your Answer Here

### Question 7

Apply the HITS Algorithm to the network to find the hub and authority scores of node 'realclearpolitics.com'. 

*Your result should return a tuple of floats `(hub_score, authority_score)`.*

In [None]:
def answer_seven():
        
    # Your Code Here
    
    return # Your Answer Here

### Question 8 

Apply the HITS Algorithm to this network to find the 5 nodes with highest hub scores.

*This function should return a list of the top 5 blogs in desending order of hub scores.*

In [None]:
def answer_eight():
        
    # Your Code Here
    
    return # Your Answer Here

### Question 9 

Apply the HITS Algorithm to this network to find the 5 nodes with highest authority scores.

*This function should return a list of the top 5 blogs in desending order of authority scores.*

In [None]:
def answer_nine():
        
    # Your Code Here
    
    return # Your Answer Here