# Hubs and Authorities

Given a query to a search engine:
- **Root:** set of highly relevant web pages (nodes) - potential `authorities`
- Find all pages (nodes) that link to a page in root - potential `hubs`
- **Base:** root nodes and any node that links to a node in root.

<br>

## HITS Algorithm

Computing *k* iterations of the HITS algorithm to assign an authority score and hub score to each node.  
**1.**  Assign each node an authority and hub score of 1.  
**2.** Apply the **Authority Update Rule:** each node's `authority` score is the sum of hub scores of each node that points to it.  
**3.** Apply the **Hub Update Rule:** each node's `hub` score is the sum of authority scores of each node that it points to.  
**4.** **Normalize** Authority and Hubs scores:   
$$
auth(j) = \frac{auth(j)}{\sum_{i \in N}{auth(i)}}
$$
$$
hub(j) = \frac{hub(j)}{\sum_{i \in N}{hub(i)}}
$$

For most networks, as *k* gets larger, authority and hub scores converge to a unique value.

In [3]:
import networkx as nx
G=nx.karate_club_graph()


<br>

**Hubs**

In [6]:
nx.hits(G)[0]

{0: 0.06687778780175725,
 1: 0.06460820139870788,
 2: 0.07720593702807278,
 3: 0.04251538956587158,
 4: 0.011920567930085257,
 5: 0.014437084548291415,
 6: 0.01422728524063945,
 7: 0.03820430110403422,
 8: 0.05287480008426348,
 9: 0.010749022088966232,
 10: 0.00981338956991206,
 11: 0.009251077981447942,
 12: 0.008964766141133599,
 13: 0.05149077757366964,
 14: 0.017029873773128715,
 15: 0.0242189787478375,
 16: 0.003965088094607881,
 17: 0.00914642878231234,
 18: 0.01046936124084876,
 19: 0.015720024731013776,
 20: 0.013435321285774323,
 21: 0.012125472243659386,
 22: 0.017344169994343128,
 23: 0.04668552502066942,
 24: 0.010930126255860845,
 25: 0.026246198040701767,
 26: 0.012553159895365179,
 27: 0.03162054846552677,
 28: 0.018444663444097797,
 29: 0.029083323651041326,
 30: 0.033896584340598744,
 31: 0.044846896017269156,
 32: 0.07114077395376944,
 33: 0.07795709396472078}

<br>

**Authorities**

In [7]:
nx.hits(G)[1]

{0: 0.0668777878017573,
 1: 0.06460820139870795,
 2: 0.07720593702807285,
 3: 0.042515389565871635,
 4: 0.011920567930085285,
 5: 0.014437084548291445,
 6: 0.014227285240639492,
 7: 0.03820430110403425,
 8: 0.05287480008426346,
 9: 0.010749022088966224,
 10: 0.00981338956991207,
 11: 0.009251077981447947,
 12: 0.008964766141133609,
 13: 0.05149077757366969,
 14: 0.017029873773128704,
 15: 0.024218978747837485,
 16: 0.003965088094607887,
 17: 0.00914642878231237,
 18: 0.010469361240848735,
 19: 0.01572002473101379,
 20: 0.01343532128577431,
 21: 0.012125472243659407,
 22: 0.01734416999434312,
 23: 0.04668552502066941,
 24: 0.010930126255860827,
 25: 0.026246198040701767,
 26: 0.012553159895365158,
 27: 0.03162054846552678,
 28: 0.01844466344409779,
 29: 0.029083323651041323,
 30: 0.03389658434059875,
 31: 0.04484689601726914,
 32: 0.0711407739537694,
 33: 0.07795709396472077}

<br>

## Summary

- The HITS algorithm starts by constructing a *root set* of relevant web pages and expanding it to a base set.
- HITS then assigns an authority and hubs scores to each node in the network.
- Nodes that have incoming edges from good hubs are good authorities, and nodes that have outgoing edges to good authorities are good hubs
- Authority and hub scores converge for most networks.
