# Install libraries

Install the required libraries through pip.

In [None]:
!pip install google-cloud-language spacy
!pip install --upgrade networkx

Download the [required model](https://spacy.io/usage/models).

In [None]:
!python -m spacy download en_core_web_sm

# Import libraries

Libraries for making use of [Google NLP](https://cloud.google.com/natural-language/). This API has very strong entity extraction. 

In [None]:
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types

Library for regular expressions, used in basic data loading.

In [None]:
import re

We use [Spacy](https://spacy.io/) for linguistic analysis of the statements. Used to extract sentence structure. 

In [None]:
import spacy

We use [Pandas](https://pandas.pydata.org/) for dealing with tabular data.

In [None]:
import pandas as pd

Standard python libraries we use to deal with:
  - processes that are IO heavy;
  - combining iterators;
  - exporting data.

In [None]:
import concurrent.futures

In [None]:
import itertools

In [None]:
import json

# Include the sole sample document

Only _one_ sample document was given. We include it here for convenience's sake.

In [None]:
document = """WITNESS B
		Date of birth:  December 17, 1991
		Date statement taken: September 14, 2006			

A.	Circumstances of the enlistment
1.	My name is “Marcus BRODY.”  I am 15 years old.  
2.	In late 2002, I can’t remember exactly when, there was a lot of fighting in Goma, where I was living with my cousins.  My parents and sisters were already killed by the Rebels.  They had attacked our village one night with guns and grenades.
3.	I went to a rally in Goma where several Government commanders spoke and said that the President and the DCP were defending Congo.  I remember one of the commanders who spoke was Chief KOBONO, because everyone cheered when he got up to speak.  There were many soldiers at the rally, about the same as the number of children in my school .  Each family had to contribute, a cow or a goat or a child.  I was around ten years old then.  My aunt said I should go, since my family was dead because of the Rebels.  I followed the DCP soldiers.
B.	Training at Kalemie
4.	The DCP soldiers took me to a training camp called Kalemie, which was a few miles outside of my village.  A number of us, men and children, were transported there in a green “stout” (open back truck).  The camp was very big, with many soldiers and lots of guns.  I think that some of the trainers were from another country because sometimes they spoke in a language I did not really understand.  The training was hard at first, but I was good at running and shooting, so the commanders knew I would make a good soldier.  They did not give us guns to keep at first, but we learnt how to clean and use them properly.  
5.	There were lots of new recruits all the time, children as well as men.  There were girls as well.  Most of the children were around my age, but some were younger, as young as seven years old.  Many of the girls were wives of the commanders.  The commanders called them their wives, but the girls did not talk about it much.  A good wife spends the entire night with her man.  The commanders laughed and said that if we boys learned to be good fighters, we would have many wives too.  I did not really take them seriously as I was just a boy.  
6.	The commanders also offered us marijuana to smoke.  They said it would help us to relax.  I did not smoke, but many of the other boys and girls smoked with the commanders.  When the boys smoked, they seemed to go calm.
7.	The commander of the camp was BAGOR.  We all knew him by name.
8.	I remember he or maybe another commander spoke to the soldiers and recruits one time after the evening meal.  He said that we should kill all Rebels, that they were the enemy.  He said that the purpose of all our hard training was to prepare us for fighting the Rebels, who were taking our land and trying to kill us.  He said the enemy had already killed many of our family members, and we were entitled to revenge.  He kept saying that they were the “enemy.”  We should kill all of them, men, women and children, and destroy their villages.  It was our duty, what we were meant to do.
9.	The President, Ule MATOBO GOBO, visited the camp one time.  He arrived in a green jeep with several other commanders.  We received special instructions in preparation for his visit.  Other members of the militia taught us what to do.  If the President came to the camp, you had to lift your gun, holding the base in your hand and putting the barrel on your shoulder, march in front of him with your legs good and straight.  I had practiced this salute many times before the President arrived, and my commander told me afterwards that my salute was one of the best in my group.
10.	The President spoke to the regular soldiers, who were in uniform, as well as the children and new adult recruits.  We were all assembled in a big hut in the middle of the camp.  There were many of us who were brought in to hear the President speak, the whole camp.  There were many other children in the crowd, boys and girls, most of whom were my age.  
11.	The President spent all morning talking to the soldiers and recruits.  He told us that we were here to become a trained army that would bring peace to Congo.  He said our enemies were all those who were opposed to peace.  He said that after the fighting was done we would be able to go back to school, get other training.  This fighting, he said, was for the good of Congo in the end.  The soldiers in the crowd cheered loudly at the end of the speech.  I too was moved by the President’s speech and wanted to fight to protect my people.
12.	The President said that when we were done with our training, we would each receive a gun.  At that time, I had not yet received my own gun.  
13.	We talked about the President’s visit afterwards with the soldiers and commanders.  I was excited to get my personal gun and uniform, to be like the soldiers.
C.	Participation in Attacks
14.	Finally the fighting came.  In February 2003, we were told there was a big attack coming, on Bankana.  I knew it was February because it was the beginning of the rainy season, and it rained heavily every day for a long time.  I had my gun, and because the commanders knew I was a good soldier, I was sent to the front line of the fighting.  There were many other boys my age on the front line, and some girls too, though just a few.
15.	The commanders reminded us that we were brave and that we must use our training to destroy the enemy.  He offered us marijuana to help us relax.  I did not take any, but several of the boys smoked and it seemed to help them relax.  Commander BAGOR, told us that the fighting was for the good of Congo in the end.
16.	The attack lasted the whole day.  We were met with fierce resistance from the armed soldiers of the village.  I shot many rounds, and killed several people.  I saw the soldiers from my platoon take a mother and her daughter from a house.  The soldiers pulled the mother away from her daughter.  I saw the soldiers kill the child in front of her mother with a machete.  
17.	We also captured some prisoners, both men and women.  During the attack, the commander told us to burn the houses and destroy the crops.  We set many fires to the buildings.  There was a lot of confusion.   
18.	The day after the attack, the DCP troops under the command of BAGOR were inspected by Commander AL-ZARIAN and Commander IKE DUBAKU at the militia headquarters in Mongbwalu.  That was the first time I saw Commander AL-ZARIAN.  He was a very high-ranking commander, a very important man.  He said his name to us, but I had heard it before, since he was such an important commander.  IKE DUBAKU ordered our platoons to re-attack Bankana and told us to follow our commanders to the frontline.  
19.	Several days later, we attacked Bankana again with another platoon that was bigger than ours.  I do not know how many men and children were in the platoon, maybe as many pupils as there were in my school.  Again, my commander sent me to the frontline, along with several other boys my age.  The men threw hand grenades and launched missiles to start the attack.  I used my gun and shot and killed many enemies.  We came under attack from armed men in the village, but I managed to make it out without being injured.  Two of my friends, however, boys my age, both died in the attack, along with a few others from the platoon that accompanied us.  One of the girls in the other platoon was shot in the foot.

D.        Role as bodyguard
20.	Afterward, I was posted to guard Commander TCHAZA’s house in a neighbourhood of Kinshasa.  I wore a “tâches-tâche” uniform and carried a gun, a fusil.  It was an honor to be told to guard the commander.  I patrolled his compound and searched all those who visited him.  Sometimes I would also accompany him to rallies in the Government villages and meetings with his commanders.  I would have shot anyone who tried to harm him.

E.	Demobilization
21.	After the President left, IKE called me on the phone and told me to go home. IKE’s phone number is 08984948494. I kept my gun but went back to my village to try to find my cousins.  Some had been killed while I was away.  I stayed there but I was not able to find any work and I did not want to go to school.  
22.	After wandering a little from village to village, where I did some field-work in exchange for food and lodging, I ended up at the center of Gbadolite for child soldiers in Kinshasa.
23.	At the center, I was interviewed several times by a white woman named SANARA who asked me about my experience in the militia.  She would take notes of our conversations.  There were many other children in the center, enough to fill my entire school.  In the room where I slept, there were about 7 girls/boys and we were all between 14 and 17 years of age.  I made friends with many of them and we used to talk about our life as soldiers.
24.	I had been staying at the center until I was transferred to a safe-house to await the trial.
"""

# Extract _events_ and _statements_

The document is structured into _events_ and _statements_. An analist transforms a recording of an interview between the witness and an investigator into a _witness report_. This report is grouped into events, and each event is supported by numbered statements. We assume all witness statements are of this form. 

In [None]:
def extract_events(document:str)->(pd.DataFrame, pd.DataFrame):
  # Extract events from document
  extracted_events = [
    (event.span()[0], event.span()[1], event.group('index'), event.group('event'))
    for event in re.finditer(r'\n(?P<index>[A-Z]+)\.\s+(?P<event>.*)', document)
  ]

  # Convert into Pandas dataframe
  df_events = pd.DataFrame([
      (event[2], event[3], document[event[1]:next_event[0]].strip())
      for event, next_event in zip(extracted_events, extracted_events[1:])
    ], columns= [ 'letter','header', 'content']
  )
  
  # Extract statements into Pandas dataframe
  df_statements \
    = df_events\
      .content.str.extractall(r'(?:^|\n)(?P<nr_statement>\d+)\.\s*(?P<statement>.*)')\
      .assign(nr_statement = lambda df: df.nr_statement.astype(int))\
      .reset_index()\
      .merge(df_events[['letter','header']], left_on='level_0', right_index=True)\
      .drop(['match','level_0'], axis=1)\
      .set_index('nr_statement')

  
  return df_statements[['letter', 'header', 'statement']]

Actually extract _events_ and _statements_.

In [None]:
df_statements =  extract_events(document)

Output the data to JSON, to be used in the visualisation.

In [None]:
print(json.dumps(df_statements.to_dict(orient='index')))

# Analyze the entities per statement

Use Google NLP to analyze the entities per statement. When more data would be available, a custom `spacy` model would also be possible. 

In [None]:
def analyze_document(client:object, doc:str):
  # Convert into Google NLP Doc
  document = types.Document(
    content= doc,
    type= enums.Document.Type.PLAIN_TEXT
  )
  
  # Analyze entities
  entities = client.analyze_entities(document).entities

  return [{
    'name' : entity.name, 
    'salience': entity.salience, 
    'mentioned_as' : mention.text.content, 
    'mentioned_type': mention.type
  }\
    for entity in entities
    for mention in entity.mentions
  ]

Method to query Google NLP in parallel to reduce wait-time.

In [None]:
def analyze_entities(docs:pd.Series)->pd.DataFrame():
  # Create NLP client
  client = language.LanguageServiceClient()
  
  def entity_iterator():
    # Work with a threadpool
    with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor:
      # Query for entities
      task_dict = {
        executor.submit(analyze_document, client, statement) : index\
          for index, statement in docs.items()
      }

      for future in concurrent.futures.as_completed(task_dict):
          nr_statement = task_dict[future]

          try:
              data = future.result()
          except Exception as exc:
              print('Statement {:} generated an exception: {:}.'.format(nr_statement, exc))
          else:
              for entity in data:
                entity['nr_statement']= nr_statement

              yield data
              
  return pd.DataFrame(list(itertools.chain(*entity_iterator())))

Get overview of all entities

In [None]:
df_entities = analyze_entities(df_statements.statement)

## Analyze the occurence of entities per statement

The main entities are those that are references as `PROPER`; see [the documentation](https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity). 

In [None]:
main_entities = df_entities.query('mentioned_type == 1').name.unique()

In [None]:
df_entities_per_statement \
  = df_entities[lambda df: df.name.isin(main_entities)]\
  .groupby(by=['name','nr_statement'])\
  .size()\
  .unstack('nr_statement')\
  .fillna(0)

## Analyze the mentions of entities in text

Load the necessary model.

In [None]:
spacy_en = spacy.load('en_core_web_sm')

Analyze each statement in the dataframe

In [None]:
def extract_relations_in_statements(df:pd.DataFrame):
  def relation_iterator():
    for statement_nr, statement in df_statements.statement.iteritems():
      # Parse the statement
      parsed_statement = spacy_en(statement)

      for chunk in parsed_statement.noun_chunks:
        # Only consider verb relationships
        if chunk.root.head.pos_ == 'VERB':
          yield { 
            'from_raw' : chunk.text,
            'from' : chunk.lemma_,
            'verb' : chunk.root.head.lemma_,
            'verb_raw': chunk.root.head.text,
            'statement': statement_nr
          }
  
  return pd.DataFrame(list(relation_iterator()))

In [None]:
df_statement_relations = extract_relations_in_statements(df_statements)

## Analyze relations between events and entities based on statements

Main entities can be found in any noun phrase by matching to a regex. Indirect references are missed in this way. 

In [None]:
main_entity_regex = re.compile('(?P<entity>{:})'.format('|'.join(main_entities)), re.IGNORECASE)

Store a dict with all relationships between verbs and entities. 

In [None]:
entity_statement_relations = df_statement_relations\
  .assign(entity = lambda df: df.from_raw.str.extract(main_entity_regex, expand= False))\
  [lambda df: ~df.entity.isnull()]\
  .to_dict(orient='index')

Store a dict with all relationships between verbs and general noun phrases.

In [None]:
all_relations = df_statement_relations.to_dict(orient='index')

Output the data to command line.

In [None]:
print(json.dumps(list(entity_statement_relations.values()), indent=2, sort_keys=True))

In [None]:
print(json.dumps(list(all_relations.values()), indent=2, sort_keys=True))

## Analyze verbs per statement

Create overview of key verbs per statement.

In [None]:
df_verbs_per_statement = df_statement_relations.groupby(by=['statement','verb']).size().rename('occurrence').reset_index()

In [None]:
print(json.dumps(list(df_verbs_per_statement.to_dict(orient='index').values()), indent=2, sort_keys=True))