# Analyzing Witness Networks

[David J. Thomas](mailto:dave.a.base@gmail.com), [thePort.us](http://thePort.us)

---

## This workbook will...

* Use the `networkx` module to analyze witnesses in the charters
* Calculate network statistics like degree, betweeness centrality, community detection, and more
* Export the network with statistics for use elsewhere in programs like Gephi
* Visualize the network as a whole and in parts
* Attempt to identify major figures at different levels of the network

**THIS WORKBOOK IS FUNCTIONAL BUT LACKS MUCH COMMENTARY: A PROEMIUM, MORE FUNCTIONALITY, AND MORE EXPLANATORY MATERIAL TO BE ADDED LATER**


---

## 1) Import Module Dependencies

The cell below loads all other Python packages needed. You **must** run this before any other cells.

In [None]:
from IPython.display import display, HTML
import networkx as nx
import community
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup

print('Dependencies loaded! PROCEED')

## 2) Crunch Network Statistics and Export Graph

This step will take advantage of networkx's statistical capabilities. Perhaps most importantly, the ability to measure various kinds of [network centralities](https://en.wikipedia.org/wiki/Centrality).

**THIS STEP WILL TAKE AWHILE, BE PATIENT**

In [None]:
print('Reading network file...')
witness_network = nx.read_gexf('../data/witness_network.gexf')
print('Done!')

def crunch_network(network):
    print('Crunching various network statistics (takes awhile)...\n')
    results = {
        'degree': {}, 'community': {}, 'degree_centrality': {}, 'betweeness_centrality': {},
        'eigenvector_centrality': {}, 'closeness_centrality': {}, 'harmonic_centrality': {}
    }
    # degrees and community
    print('Degrees...', end='')
    results['degree'] = nx.degree(witness_network)
    print(' Done!\nCommunity detection with best partition...', end='')
    results['community'] = community.community_louvain.best_partition(witness_network)
    # centralities
    print(' Done!\nDegree centrality...', end='')
    results['degree_centrality'] = nx.degree_centrality(witness_network)
    print(' Done!\nBetweeness (shortest-path) centrality...', end='')
    results['betweeness_centrality'] = nx.betweenness_centrality(witness_network)
    print(' Done!\nEigenvector centrality...', end='')
    results['eigenvector_centrality'] = dict(nx.eigenvector_centrality(witness_network))
    print(' Done!\nCloseness centrality...', end='')
    results['closeness_centrality'] = nx.closeness_centrality(witness_network)
    print(' Done!\nHarmonic centrality...', end='')
    results['harmonic_centrality'] =  nx.harmonic_centrality(witness_network)
    print(' Done!\n\nFinished calculating network statistics.')
    # now take result dicts and feed them back into nodes as attributes
    for attribute_name in results.keys():
        attribute_results = results[attribute_name]
        for node_id in network.nodes():
            new_node_stat = attribute_results[node_id]
            network.nodes[node_id][attribute_name] = new_node_stat
    return network


witness_network = crunch_network(witness_network)
print('Exporting network with calculated statistics to export/witness_network.gexf')
nx.write_gexf(witness_network, '../data/witness_network.gexf')
print('Exported network successfully.')

## 5) Visualize Entire Witness Network

It's time to finally see some sweet, sweet, payoff! We can use networkx itself to visualize our network and see what's interesting.

In [None]:
def draw_network(network, with_labels=False):
    node_size_map = [network.nodes[node]['betweeness_centrality'] * 3000 for node in network.nodes()]
    node_color_map = [network.nodes[node]['community'] for node in network.nodes()]
    options = {
        'node_color': node_color_map,
        'node_size': node_size_map,
        'width': 0.1,
        'alpha': 0.5,
        'with_labels': with_labels
    }
    print('Generating network preview...')
    nx.draw(network, pos=nx.spring_layout(network, k=0.05, scale=1), **options)
    plt.draw()
    return True


print('Entire Witness Network')
draw_network(witness_network)

## 6) Finding Key Figures

Now we've seen the entire network. It may look like a big mess (what some network analysts call "the spaghetti monster"), but there are some important details that we can tell right away. For starters, *almost* everything is interconnected! There are but a few near-loners separate from the network. This isn't a given. Many networks are split into highly independent and barely connected sub components. Some networks have components that are totally detacted from any other part of the network. This network has none of those features.

So, right away we can tell that the world of charters and elite witnesses was highly interconnected and reciprocal. At the same time, we can also tell that the network is not an even mass of jumbled connections. There are subformations within the larger network. Using "neighborhood detection", we have colored this network to represent the mathematical "guess" at where the boundaries of major subcomponents of the larger network lie. These different neighborhoods often (but not necessarily) loosely relate to meaningful real world distinctions such as political borders. While it isn't an objective or certain determinant of whether one individual is or is not a member of some real group... it can be a quick way at visualizing major groupings within the whole network.

To go farther though, we can look at individuals, and chop up the network into subcomponents.

### 6a) Listing Top People by Attributes

Now let's dive into some specific network statistics to give us some "top" people in the network and see who we find.

We are going to use several measures to study "centrality," the two most important being degree and betweeness. Degree centrality is simple a measure of which nodes (people) have the most connections. In this case, that means which people have the greatest number of connections to others through acts of witnessing. This can be important, but is hardly the only measure.

Betweeness centrality, put simply, measures not only how many "connections" one has to others... but how many connections those connections have, and so on. Depending on your needs, this is often more meaningful measure of centrality, since having the *most* friends isn't the same as having *important* friends.

There are other forms of centrality that we won't get into here, for a more detailed guide check out [this FAQ](https://cambridge-intelligence.com/keylines-faqs-social-network-analysis/).

In [None]:
def order_nodes_by_attribute(network, attribute_name):
    totals = {}
    # build a key/val dict from id/val of node
    for node in network.nodes():
        totals[node] = network.nodes[node][attribute_name]
    # sort into ordered list of tuples with id/val and return
    return [(key, totals[key]) for key in sorted(totals, key=totals.get, reverse=True)]


def print_top_nodes_by_attribute(network, attribute_name, print_name, limit=5):
    print('----- {} -----'.format(print_name))
    counter = 0
    for node_id, val in order_nodes_by_attribute(network, attribute_name)[0:4]:
        counter += 1
        node_link = network.nodes[node_id]['link']
        print('{}.) {}: {}'.format(counter, node_id, node_link, val))


attributes_to_print = [
    ('degree', 'Degree'),
    ('betweeness_centrality', 'Betweeness Centrality'),
    ('degree_centrality', 'Degree Centrality'),
    ('closeness_centrality', 'Closeness Centrality'),
    ('eigenvector_centrality', 'Eigenvector Centrality'),
    ('harmonic_centrality', 'Harmonic Centrality')
]

for attribute_name, print_name in attributes_to_print:
    print_top_nodes_by_attribute(witness_network, attribute_name, print_name)
    print('')

If you got the same results as I did, Wulfred 6 appears to be the individual with the most amount of direct connections to others, followed by Eadwulf 7, Ceolnoth 3, and Alfred 8. These are just the ID numbers tied to these individuals.

For betweeness centrality, we have Alfred 8, then Aethelbald 4, Cenwealh 2, and Offa 7.

As we will see in a second, the betweeness centrality seems to have been a better judge of who was most important in the network.

### 6b) Looking into Individuals

Now lets look up who those people were that we ran into above. The following bit of code will grab the HTML from the PASE database containing factoids about their lives to give us some biographical info.

In [None]:
def lookup_person(network, node_id):
    link = network.nodes[node_id]['link']
    pase_content = None
    page_html = requests.get(link).text
    pase_content = BeautifulSoup(page_html, 'html.parser')
    display(HTML(pase_content.get_text()))
    return pase_content

notable_persons = ['Alfred 8', 'Wulfred 6', 'Eadwulf 7', 'Ceolnoth 3', 'Æthelbald 4', 'Cenwealh 2', 'Offa 7']

for notable_person in notable_persons:
    print('---', notable_person)
    lookup_person(witness_network, notable_person)

print('Finished displaying persons')

So, looking at the results for degree centrality it places the most important rankings at...

1. Wulfred, Archbishop of Canterbury 
2. Eadwulf, Bishop of Lindsay
3. Ceolnoth, Archbishop of Canterbury
4. Alfred the Great, Most Famous King of Wessex

Which isn't bad. Certainly these are all important individuals. But Alfred isn't the most important, nor are many other critical individuals. But when we use betweeness centrality the results we get are...

1. Alfred the Great, Most Famous King of Wessex
2. Aethelbald, King of Mercia
3. Cenwealh, King of Wessex
4. Offa, King of Mercia

Now, it is *critical* to understand that this measure isn't any more *correct* than degree centrality. They merely each highly different aspects. But, it does seem that this one has locked on pretty hard to individuals we know to be the most important figures of their day.

While this means they we haven't "learned" anything new yet... it is an excellent confirmation that this technique works and has amazing potential for exploring historical networks!

### 7) Looking into Sub-Communities

Where we can learn more, and make things more interesting, is by zooming in and chopping up the network.

In the following steps, we are going to separate two "neighborhoods" of the network and examine each on its own as an independent network. This difference in scale will allow us to see how some individuals operate differently at "local" vrs "global" levels.

The following two cells take the communities of Alfred the Great and Offa of Mercia, separate them, run network analysis statistics on them again, and tell us who them most important nodes are in the *new* network

In [None]:
def get_community(network, community_num):
    """Returns a new network with only nodes/edges from a specified community"""
    non_community_nodes = []
    # perform full copy of network
    subnetwork = network.copy()
    # identify only the nodes to be keps
    for node_id in subnetwork.nodes():
        if subnetwork.nodes[node_id]['community'] != community_num:
            non_community_nodes.append(node_id)
    subnetwork.remove_nodes_from(non_community_nodes)
    # make new subgraph with select nodes and crunch new stats
    return crunch_network(subnetwork)


print('Alfred the Great')
alfred_community = get_community(witness_network, witness_network.nodes['Alfred 8']['community'])
draw_network(alfred_community)

for attribute_name, print_name in attributes_to_print:
    print_top_nodes_by_attribute(alfred_community, attribute_name, print_name)
    print('')

In [None]:
print('Offa of Mercia')
offa_community = get_community(witness_network, witness_network.nodes['Offa 7']['community'])
draw_network(offa_community, with_labels=True)

for attribute_name, print_name in attributes_to_print:
    print_top_nodes_by_attribute(offa_community, attribute_name, print_name)
    print('')

We can see that in his own neighborhood, Alfred still is not the most central in terms of degrees, yet his betweeness centrality is the highest. Meanwhile Offa is both degree and betweeness centrality leader in his neighborhood. Offa's characteristics show he has a much more directly local influence. Alfred's connections, meanwhile, appear to be more exogenous (going outside the neighborhood).

This is just the start. Chopping up and analyzing the network is where real analysis starts to happen. I hope I've shown you the power of network analysis for exploring historical networks!