# `pageRank`算法

`pageRank`的原计算公式为：
$$\mathrm{PR_i = \alpha\sum_{j\in N_i}\frac{PR_j}{k_{j,out}}+\beta}$$

$PR_i$表示节点`i`的`pageRank`中心性，$\alpha=0.85, \beta=\frac{1-\alpha}{N}$。值得注意的是，网页链接网络中的关系为参照关系，刚好与信息发送关系的方向相反。考虑到扩散模型中存在接收到的信息之间的竞争，且不存在信息发送的成本，因此公式中的$k_{j,out}$应该替换为$k_{j,in}$:
$$\mathrm{PR_i = \alpha\sum_{j\in N_i}\frac{PR_j}{k_{j,in}}+\beta}$$

## `PageRank`的迭代算法

- 输入: 含有n个结点的有向图，转移矩阵M，阻尼因子d，初始向量$R_0$
- 输出: 有向图的PageRank向量R

- 过程：
    - 令t=0

    - 计算
    $$
    R_{t+1}=dMR_t+\frac{1-d}{n}1
    $$
    - 如果$R_{t+1}$与$R_t$充分接近，令$R=R_{t+1}$，停止迭代

    - 否则，令$t=t+1$，执行步骤(2)

In [2]:
import networkx as nx
import numpy as np

In [3]:
def page_rank_centrality(g, alpha=0.85, max_iter=100, epsilon=1e-6):
    '''
    Description:
    The realization of a iteration algorithm of calculation PageRank scores for nodes in a directed network.
    ----------
    Parameters:
    g: a directed network
    alpha: adjust coefficient
    max_iter: the maximal iterations for the iteration
    epsilon: a threshold for stop the iteration
    ----------
    Returns:
    a dict consisting of elements (node, PageRank score)
    ----------
    
    '''
    if len(g) == 0:
        return {}
    
    if not nx.is_directed(g):
        g = g.to_directed()
    
    # 1. 初始化pr值
    page_rank = dict.fromkeys(g.nodes(), 1 / g.number_of_nodes())
    beta = (1 - alpha) / g.number_of_nodes()  # 防止节点pr值为0     
    in_degree_dict = {i: max(1, v) for i, v in g.in_degree()} # 为了计算，将所有入度为0的节点的值调整为1
    # 2. 迭代
    flag = 1
    while flag <= max_iter:
        change = 0
        for i in g:  # 更新所有节点的pr值
            pr = alpha * np.sum([page_rank[j] / in_degree_dict[j] for j in g.successors(i)]) + beta
            change += abs(page_rank[i] - pr)
            page_rank[i] = pr
        
        if change <= epsilon:
            print(f'迭代在{flag}轮后达到阈值，终止')
            break

        flag += 1
    else:
        print(f'迭代在{max_iter}轮后终止')

    return list(page_rank.items())

In [6]:
G = nx.barabasi_albert_graph(100, 6)

In [7]:
DG= G.to_directed()

In [13]:
pr_scores = page_rank_centrality(DG)

迭代在37轮后达到阈值，终止


In [15]:
sorted(pr_scores, key=lambda x : x[1], reverse=True)

[(6, 0.03129824798503532),
 (7, 0.02788032057209772),
 (10, 0.02524468259255166),
 (4, 0.022616135298548883),
 (8, 0.02109611561550137),
 (9, 0.020850269875821937),
 (0, 0.020675282554670608),
 (14, 0.020657772039556654),
 (20, 0.019817239273971297),
 (15, 0.019028505644852287),
 (21, 0.016734948816716652),
 (12, 0.01651065443784313),
 (30, 0.016434635508112333),
 (19, 0.0158511746874704),
 (16, 0.01567394595733223),
 (11, 0.015069359329807328),
 (18, 0.014279931604200437),
 (13, 0.014155709711722886),
 (1, 0.014018844958946638),
 (25, 0.013827642446116986),
 (26, 0.013727245951813002),
 (23, 0.01301649964313976),
 (5, 0.012880708386383692),
 (49, 0.011621887076815237),
 (35, 0.011554323885061989),
 (50, 0.011496490476486966),
 (46, 0.011423345764184162),
 (31, 0.011264744439386982),
 (53, 0.010869728031766576),
 (39, 0.010588758854093489),
 (27, 0.009905131851345825),
 (28, 0.009706347876294234),
 (44, 0.009686716029241984),
 (54, 0.00923555092672701),
 (57, 0.00918456671520868),
 (34