# SNA on Humanities Data

# Introduction

In this section, we will apply the skills we have learned about SNA on real humanities data. Again, we will be working with the Volume 7 Dataset from the South African TRC. Unlike the last chapter on Topic Modeling, we will not be interested in how the descriptions of violence cluster together. Instead, we will be interested in exploring victim relationships to specific organizations in their description. It is important to note here, that we do not in this approach know the relationship between the victim and the organization. There is equal possibility that they were a member or victim of the organization.

## Examining the Data

In [50]:
import pandas as pd
import random

In [75]:
df = pd.read_csv("../data/trc.csv")
# df = df.dropna()
df = df[:1000]
df = df[["Last", "First", "Description","ORG", "Place"]]

In [76]:
df

Unnamed: 0,Last,First,Description,ORG,Place
0,AARON,Thabo Simon,An ANCYL member who was shot and severely inju...,ANC|ANCYL|Police|SAP,Bethulie
1,ABBOTT,Montaigne,A member of the SADF who was severely injured ...,SADF,Messina
2,ABRAHAM,Nzaliseko Christopher,A COSAS supporter who was kicked and beaten wi...,COSAS|Police,Mdantsane
3,ABRAHAMS,Achmat Fardiel,Was shot and blinded in one eye by members of ...,SAP,Athlone
4,ABRAHAMS,Annalene Mildred,Was shot and injured by members of the SAP in ...,Police|SAP,Robertson
...,...,...,...,...,...
995,CELE,Nompumelelo Iris ‘Magwaza’,An ANC district organiser for southern Natal w...,ANC,Ndwedwe
996,CELE,Nomvula Eunice,Her home was burnt down by IFP supporters in U...,ANC,Umbumbulu
997,CELE,Nonhlanhla Evelina,An IFP supporter who was killed by ANC support...,ANC,Umzimkulu
998,CELE,Nozimpahla,Her home was burnt down by IFP supporters on 2...,,Sonkombo


In [77]:
nodes = []
edge_list = []
found_orgs = []
for idx, row in df.iterrows():
    node_id = f"{idx}_{row.First} {row.Last}"

    place = row.Place
    nodes.append(({"name": node_id, "color": "green", "place": place}))
    if pd.isnull(row.ORG) == False:
        orgs = row.ORG.split("|")
        for org in orgs:
            if org not in found_orgs:
                color = "#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
                nodes.append({"name": org, "color": color})
            edge_list.append({"source": org, "target": node_id, "place": place})
print(nodes[:1])
print(edge_list[:1])
print(len(nodes))

[{'name': '0_Thabo Simon AARON', 'color': 'green', 'place': 'Bethulie'}]
[{'source': 'ANC', 'target': '0_Thabo Simon AARON', 'place': 'Bethulie'}]
2172


In [78]:
node_df = pd.DataFrame(nodes)
node_df.to_csv("../data/nodes.csv", index=False)
node_df.head(1)

Unnamed: 0,name,color,place
0,0_Thabo Simon AARON,green,Bethulie


In [79]:
edge_df = pd.DataFrame(edge_list)
edge_df.to_csv("../data/edges.csv", index=False)
edge_df.head(1)

Unnamed: 0,source,target,place
0,ANC,0_Thabo Simon AARON,Bethulie
