# Applied Data Analysis - Impact of conflicts on African civilian population

### Abstract

The age of information makes it seem as though conflict and wars are, and always will be, ever growing part of our lives. We often feel overwhelmed by the amount and scope of information that is accessible and directed towards us, in turn rendering us indifferent to the consequences and casualties of the war.

Civilians are the greatest casualty of any war, and casualties are not always measured in body count. Civil liberties and political freedoms are and should be enjoyed by the people from all around the world, and conflicts always bring changes to freedom of expression, for better or worse.

In this project we want to focus on the continent of Africa, which we feel is underreported in the context of occurring conflicts and casualties. We will utilize UDCP dataset documenting individual events of organized violence, empowered by the Freedom House 'Freedom in the World' yearly surveys and the Human Development Index as measured by the UN.

Our goal is to produce a report with an overarching story about impacts of conflicts on the African continent in the observed period from 1990 to 2015, with the focus on civilian populations - what are the long term consequences that conflicts have on the development of the civilian population.

### Libraries

In [493]:
from bkcharts import Line, show
from bokeh.embed import file_html
from bokeh.resources import CDN

import branca
import copy
import folium
import json
import matplotlib.pyplot as plt
import numpy as np
import os.path
import pandas as pd
import seaborn as sns
import zipfile

%matplotlib inline

### Configuration

In [494]:
from IPython.display import IFrame, HTML
from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"
pd.options.mode.chained_assignment = None

### Specification of regions and country names

Throughout the project, we will observe African continent through its regions as defined by [United Nations' geoscheme for Africa](https://en.wikipedia.org/wiki/United_Nations_geoscheme_for_Africa). This geoscheme is used by the UN and maintained by the UNSD department for statistical purposes, and defines five regions as follows: Northern Africa, Eastern Africa, Southern Africa, Western Africa and Central Africa. Although multiple definitions of African regions exist, depending on the observed context, we consider earlier definition as relevant and appropriate for our statistical analysis.

Furthermore, our definition of African region diverges from the official geoscheme in the following ways:
* In our analysis we have excluded overseas territories:
    * [Mayotte](https://en.wikipedia.org/wiki/Mayotte) (French overseas territory)
    * [Reunion](https://en.wikipedia.org/wiki/Réunion) (French overseas territory)
    * [Saint Helena, Ascension and Tristan da Cunha](https://en.wikipedia.org/wiki/Saint_Helena,_Ascension_and_Tristan_da_Cunha) (British overseas territory)
    
    
* In our analysis we have excluded disputed territories:
    * [Western Sahara](https://en.wikipedia.org/wiki/Western_Sahara)
    
Due to the use of several datasets from several different data sources, some countries have been referenced by different variations on their original name in different datasets. This has resulted in need to adjust names accross datasets. We have used names of countries as specified by the United Nations' geoscheme for Africa as reference accross these datasets, while adjusting any difference when needed.

In [495]:
na_countries = ['Algeria', 'Egypt', 'Libya', 'Morocco', 'Sudan',
                'Tunisia']

ea_countries = ['Burundi', 'Comoros', 'Djibouti', 'Eritrea', 'Ethiopia', 
                'Kenya', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique',
                'Rwanda', 'Seychelles', 'Somalia', 'South Sudan', 'Tanzania',
                'Uganda', 'Zambia', 'Zimbabwe']

sa_countries = ['Botswana', 'Lesotho', 'Namibia', 'South Africa', 'Swaziland']

wa_countries = ['Benin', 'Burkina Faso', 'Cabo Verde', 'Cote d\'Ivoire', 'Gambia', 
                'Ghana', 'Guinea', 'Guinea-Bissau', 'Liberia', 'Mali',
                'Mauritania', 'Niger', 'Nigeria', 'Senegal', 'Sierra Leone', 
                'Togo']

ca_countries = ['Angola', 'Cameroon', 'Central African Republic', 'Chad', 'Democratic Republic of the Congo',
                'Equatorial Guinea', 'Gabon', 'Republic of the Congo', 'Sao Tome and Principe']

### UCDP dataset

Uppsala Conflict Data Program's georeferenced event dataset, [Global Version 17.1 (2016)](http://www.ucdp.uu.se/downloads/ged/ged171-xlsx.zip), is the central dataset used in our project. Dataset covers individual events of organized violence - phenomena of lethal violence occurring at a given time and place.

>These events are sufficiently fine-grained to be geo-coded down to the level of individual villages, with temporal durations disaggregated to single, individudal days.

There are 135,181 events present in the dataset covering the entirety of the Globe (excluding Syria), spanning from 01/01/1989 to 31/12/2016. Events are futher defined as follows:

>An incident where armed force was by an organised actor against another organized actor, or against civilians, resulting in at least 1 direct death at a specific location and a specific date.

#### Features

UDCP dataset contains many features that thoroughly document occurred events, but not all of which are utilized throughout this project. We will utilize and focus our attention on the following features:
* `year` - The year of the event.
* `type_of_violence` - Type of UCDP conflict:
    * `0` - state-based conflict
    * `1` - non-state conflict
    * `3` - one-sided violence
    
    
* `conflict_name` - Name of the UCDP conflict to which the event belongs.
* `side_a` - The name of side A in the dyad. In state-based conflicts this is always a government. In one-sided violence this is always the perpetrating party.
* `side_b` - The name of side B in the dyad. In state-based conflicts this is always the rebel movement or rivalling government. In one-sided violence this is always “civilians”.
* `country` - Name of the country in which the event took place.
* `region` - Region where the event took place.
* `best_est` - The best (most likely) estimate of total fatalities resulting from an event.

This project makes use of the events which ocurred in the African region throughout the period of 26 years (1990-2015), or roughly 35,437 events. Events documented in the dataset will be aggregated by region, and with information about Human Development Index (HDI), Political Freedom score and Civil Liberties score analyzed and visualized on per-region basis.

#### Adjustment of country names

Following names have been altered in the original UCDP dataset in order to follow earlier established country naming convention accross the project:

| Old country name      | New country name                 |
|:----------------------|:---------------------------------|
| Madagascar (Malagasy) | Madagascar                       |
| Zimbabwe (Rhodesia)   | Zimbabwe                         |
| Ivory Coast           | Cote d'Ivoire                    |
| DR Congo (Zaire)      | Democratic Republic of the Congo |
| Congo                 | Republic of the Congo            |

#### Complete documentation

Full documentation of the used dataset can be obtained [here](http://www.ucdp.uu.se/downloads/ged/ged171.pdf) for further reference.

In [496]:
def read_data():
    '''
    The repo contains the data in a zipped file to save some space,
    this function will automatically extract the csv from that zip,
    if it has not already been extracted, and return the data in a DataFrame.
    '''
    name = './data/ged171'

    # Unzip csv if not unzipped
    if not os.path.isfile(name+'.csv'):
        zip_ = zipfile.ZipFile(name+'.zip', 'r')
        zip_.extractall('./data/')
        zip_.close()

    return pd.read_csv(name+'.csv')

def adjust_country_name(name):
    result = name
    if name == 'Madagascar (Malagasy)':
        result = 'Madagascar'
    elif name == 'Zimbabwe (Rhodesia)':
        result = 'Zimbabwe'
    elif name == 'Ivory Coast':
        result = 'Cote d\'Ivoire'
    elif name == 'DR Congo (Zaire)':
        result = 'Democratic Republic of the Congo'
    elif name == 'Congo':
        result = 'Republic of the Congo'
    return result

df = read_data()
df.country = df.country.apply(adjust_country_name)

df_na = df.loc[df.country.isin(na_countries)]
df_ea = df.loc[df.country.isin(ea_countries)]
df_sa = df.loc[df.country.isin(sa_countries)]
df_wa = df.loc[df.country.isin(wa_countries)]
df_ca = df.loc[df.country.isin(ca_countries)]

### Freedom House dataset

Freedom House is a U.S.-based U.S. Government-funded non-governmental organization (NGO) that conducts research and advocacy on democracy, political freedom, and human rights. Freedom House was founded in October 1941. Organisation describes its goal as:

> We analyze the challenges to freedom, advocate for greater political rights and civil liberties, and support frontline activists to defend human rights and promote democratic change. Founded in 1941, Freedom House was the first American organization to champion the advancement of freedom globally.

The organization's annual *Freedom in the World* report, which assesses each country's degree of political freedoms and civil liberties, is frequently cited by political scientists, journalists, and policymakers.

Organisation's reports on the state of country's political freedoms and civil liberties form our second dataset and can be obtained [here](https://freedomhouse.org/sites/default/files/FIW2017_Data.zip). We will focus our attention on the file `FH_Country_and_Territory_Ratings_and_Statuses_1972-2016.xls` which contains information about political freedom and civil liberties scores for individual countries throughout the period of 1972-2016. Moreover, we are only interested in the scores of African countries throughout the period of 1990-2015.

#### Features

Political rights and civil liberties are measured on a one-to-seven scale, with one representing the highest degree of freedom and seven the lowest. Furthermore, based on the scores assigned for political freedom and civil liberties, Freedom House then designates one of the labels: 'Free', 'Partly Free' and 'Not Free' to each country as an overarching indicator of the state of freedom in that particular country.

#### Evaluation

While there is some debate over the neutrality of Freedom House and the methodology used for the Freedom in the World report, none of the criticisms have demonstrated a systematic bias in all the ratings. For this very reason, as well as high number of citations that Freedom in the World report experiences, we have chosen to proceed with this dataset as the source of relevant indicators on the state of political and civil liberties in the observed countries. Further discussion on the evaluation and its criticism can be found [here](https://en.wikipedia.org/wiki/Freedom_in_the_World#Evaluation).

#### Missing rankings

Through the observed period, several countries have become independent, split into two or more countries, or merged with a neighboring  state. Scores for these countries are given only for the period of their existence as independent states.

#### Adjustment of country names

Following names have been altered in the original Freedom in the World dataset in order to follow earlier established country naming convention accross the project:

| Old country name    | New country name                 |
|:--------------------|:---------------------------------|
| Sao Tome & Principe | Sao Tome and Principe            |
| Congo (Brazzaville) | Republic of the Congo            |
| Congo (Kinshasa)    | Democratic Republic of the Congo |
| Gambia, The         | Gambia                           |
| Cape Verde          | Cabo Verde                       |

#### Separating relevant information

Information relevant for this project
* Political Freedoms score
* Civil liberties score
* Freedom House score

is aggregated by the year of the report. We will split this data in three respective dataframes for the easier use in the future.

In [497]:
fh_path = os.path.join('data', 'Country ratings and statuses.xlsx')
fh_data = pd.read_excel(fh_path, na_values='-')

def fh_adjust_country_name(name):
    result = name
    if name == 'Sao Tome & Principe':
        result = 'Sao Tome and Principe'
    elif name == 'Congo (Brazzaville)':
        result = 'Republic of the Congo'
    elif name == 'Congo (Kinshasa)':
        result = 'Democratic Republic of the Congo'
    elif name == 'Gambia, The':
        result = 'Gambia'
    elif name == 'Cape Verde':
        result = 'Cabo Verde'
    return result

fh_data = fh_data.drop(fh_data.columns[1:52], axis=1)
fh_data = fh_data.iloc[2:]
fh_data[fh_data.columns[0]] = fh_data[fh_data.columns[0]].apply(fh_adjust_country_name)

political_rights = pd.DataFrame()
political_rights['Country'] = fh_data[fh_data.columns[0]]
pr_index = 1 + 0

civil_liberties = pd.DataFrame()
civil_liberties['Country'] = fh_data[fh_data.columns[0]]
cl_index = 1 + 1

fh_status = pd.DataFrame()
fh_status['Country'] = fh_data[fh_data.columns[0]]
fhs_index = 1 + 2

year = 1990

for i in range(26):
    political_rights[year] = fh_data[fh_data.columns[pr_index]]
    civil_liberties[year] = fh_data[fh_data.columns[cl_index]]
    fh_status[year] = fh_data[fh_data.columns[fhs_index]]
    
    pr_index += 3
    cl_index += 3
    fhs_index += 3
    year += 1
    
# Divide by region on political freedoms
political_rights_na = political_rights[political_rights['Country'].isin(na_countries)]
political_rights_ea = political_rights[political_rights['Country'].isin(ea_countries)]
political_rights_sa = political_rights[political_rights['Country'].isin(sa_countries)]
political_rights_wa = political_rights[political_rights['Country'].isin(wa_countries)]
political_rights_ca = political_rights[political_rights['Country'].isin(ca_countries)]

# Divide by region on civil liberties
civil_liberties_na =  civil_liberties[civil_liberties['Country'].isin(na_countries)]
civil_liberties_ea =  civil_liberties[civil_liberties['Country'].isin(ea_countries)]
civil_liberties_sa =  civil_liberties[civil_liberties['Country'].isin(sa_countries)]
civil_liberties_wa =  civil_liberties[civil_liberties['Country'].isin(wa_countries)]
civil_liberties_ca =  civil_liberties[civil_liberties['Country'].isin(ca_countries)]

# Divide by region on Freedom House's score
fh_status_na =  fh_status[fh_status['Country'].isin(na_countries)]
fh_status_ea =  fh_status[fh_status['Country'].isin(ea_countries)]
fh_status_sa =  fh_status[fh_status['Country'].isin(sa_countries)]
fh_status_wa =  fh_status[fh_status['Country'].isin(wa_countries)]
fh_status_ca =  fh_status[fh_status['Country'].isin(ca_countries)]

### Human Development Index dataset

The Human Development Index (HDI) is a composite statistic (composite index) of life expectancy, education, and per capita income indicators, which are used to rank countries into four tiers of human development. A country scores higher HDI when the lifespan is higher, the education level is higher, and the GDP per capita is higher.

>The origins of the HDI are found in the annual Human Development Reports produced by the Human Development Reports Office of the United Nations Development Programme (UNDP). These were devised and launched by Pakistani economist Mahbub ul Haq in 1990, and had the explicit purpose "to shift the focus of development economics from national income accounting to people-centered policies".

Our final dataset is Human Development report by the UN that includes calculated Human Development Indices for the years 1990 through 2015. Naturally, we are interested in the values associated countries from the African continent.

Human Development Index obtains value in the range [0,1], 1 being best possible value.

#### Missing rankings

Through the observed period, several countries have become independent, split into two or more countries, or merged with a neighboring  state. Scores for these countries are given only for the period of their existence as independent states.

#### Adjustment of country names

Following names have been altered in the original Human Development Index dataset in order to follow earlier established country naming convention accross the project:

| Old country name                   | New country name                 |
|:-----------------------------------|:---------------------------------|
| Congo (Democratic Republic of the) | Democratic Republic of the Congo |
| Congo                              | Republic of the Congo            |
| Tanzania (United Republic of)      | Tanzania                         |

In [498]:
hdi_path = os.path.join('data','Human Development Index (HDI).csv')
hdi_data = pd.read_csv(hdi_path, skiprows=1)
hdi_data = hdi_data.drop(['HDI Rank (2015)'], axis=1)

def hdi_adjust_country_name(name):
    result = name
    if name == 'Congo (Democratic Republic of the)':
        result = 'Democratic Republic of the Congo'
    elif name == 'Congo':
        result = 'Republic of the Congo'
    elif name == 'Tanzania (United Republic of)':
        result = 'Tanzania'
    return result

hdi_data.Country = hdi_data.Country.apply(lambda x: hdi_adjust_country_name(x.strip()))

hdi_na = hdi_data[hdi_data['Country'].isin(na_countries)]
hdi_ea = hdi_data[hdi_data['Country'].isin(ea_countries)]
hdi_sa = hdi_data[hdi_data['Country'].isin(sa_countries)]
hdi_wa = hdi_data[hdi_data['Country'].isin(wa_countries)]
hdi_ca = hdi_data[hdi_data['Country'].isin(ca_countries)]

### Map data

Map visualizations throughout the project make use of the `africa.json` file which maps African countries, available under MIT license by the author [David Eldersveld](https://github.com/deldersveld/topojson/blob/master/continents/africa.json). We have performed several modifications to the original file in order to adjust country names to follow naming convention used throughout this project:

| Old country name  | New country name      |
|:------------------|:----------------------|
| Cape Verde        | Cabo Verde            |
| Ivory Coast       | Code d'Ivoire         |
| Guinea Bissau     | Guinea-Bissau         |
| Republic of Congo | Republic of the Congo |

Furthermore, we have not been able to properly display countries Mauritius and Seychelles with our working version of topojson file. For this reason, aforementioned countries have been omitted in the map visualizations, and will be added at a later date.

In [544]:
africa_geo_path = os.path.join('topojson', 'africa.json')
africa_geo_data = json.load(open(africa_geo_path))

def extract_countries(countries):
    new_countries = []
    
    for country_json in africa_geo_data['objects']['continent_Africa_subunits']['geometries']:
        if country_json['properties']['geounit'] in countries:
            new_country_json = copy.deepcopy(africa_geo_data)
            new_country_json['objects']['continent_Africa_subunits']['geometries'] = [country_json]
            new_countries.append(new_country_json)
            
    return new_countries

### Utility dictionaries

In [546]:
region_countries_dict = {
    'na': na_countries,
    'ea': ea_countries,
    'sa': sa_countries,
    'wa': wa_countries,
    'ca': ca_countries
}

region_center_dict = {
    'na': [22.989358, 11.861105],
    'ea': [-3.967289, 43.487203],
    'sa': [-25.734728, 24.919672],
    'wa': [16.155142, -3.864663],
    'ca': [4.322669, 19.261304]
}

conflict_region_dict = { 
    'na': df_na,
    'ea': df_ea,
    'sa': df_sa,
    'wa': df_wa,
    'ca': df_ca
}

hdi_region_dict = {
    'na': hdi_na,
    'ea': hdi_ea,
    'sa': hdi_sa,
    'wa': hdi_wa,
    'ca': hdi_ca
}

pr_region_dict = {
    'na': political_rights_na,
    'ea': political_rights_ea,
    'sa': political_rights_sa,
    'wa': political_rights_wa,
    'ca': political_rights_ca
}

cl_region_dict = {
    'na': civil_liberties_na,
    'ea': civil_liberties_ea,
    'sa': civil_liberties_sa,
    'wa': civil_liberties_wa,
    'ca': civil_liberties_ca
}

### Utility functions

#### Function for country information filtering and grouping

In [501]:
def country_conflict_distribution(country, region={'na', 'ea', 'sa', 'wa', 'ca'}):
    cc_data = conflict_region_dict[region]
        
    cc_data = cc_data[cc_data.country == country] \
        .groupby(['year', 'type_of_violence']) \
        .agg('count') \
        .reset_index() \
        .set_index('year')
    
    cc_data = cc_data[(cc_data.index >= 1990) & (cc_data.index <= 2015)].fillna(0)
    
    cc_data = cc_data[['type_of_violence','id']]
    cc_data.columns = ['Type of violence', 'Count']
    return cc_data

def country_indicator(country, region_data, indicator_name):
    df = region_data[region_data.Country == country]
    df = df.transpose()
    df = df.drop(['Country'], axis=0)
    df.index = df.index.astype(int)
    df.columns = [indicator_name]
    return df

def country_hdi(country, region={'na','ea','sa','wa','ca'}):
    r_hdi = hdi_region_dict[region]
    return country_indicator(country, r_hdi, 'HDI')

def country_pr(country, region={'na','ea','sa','wa','ca'}):
    r_pr = pr_region_dict[region]
    return country_indicator(country, r_pr, 'Political rights score')

def country_cl(country, region={'na','ea','sa','wa','ca'}):
    r_cl = cl_region_dict[region]
    return country_indicator(country, r_cl, 'Civil liberties score')

def country_indicators(country, region={'na','ea','sa','wa','ca'}):
    c_hdi = country_hdi(country, region)
    c_pr = country_pr(country, region)
    c_cl = country_cl(country, region)
    
    return c_hdi.join([c_pr, c_cl], how='inner').fillna(0)

#### Functions for region information filtering and grouping

In [502]:
def region_conflict_distribution(region={'na', 'ea', 'sa', 'wa', 'ca'}):
    cc_data = conflict_region_dict[region]
        
    cc_data = cc_data.groupby(['year', 'type_of_violence']) \
        .agg('count') \
        .reset_index() \
        .set_index('year')
    
    cc_data = cc_data[(cc_data.index >= 1990) & (cc_data.index <= 2015)].fillna(0)
    
    cc_data = cc_data[['type_of_violence','id']]
    cc_data.columns = ['Type of violence', 'Count']
    return cc_data

def region_indicator(data, indicator_name):
    df = copy.deepcopy(data)
    df = df.transpose()
    df = df.drop(['Country'], axis=0)
    df.index = df.index.astype(int)
    df[indicator_name] = df.mean(axis=1)
    df = df.drop(df.columns[0:len(df.columns)-1], axis=1)
    return df

def region_hdi(region={'na','ea','sa','wa','ca'}):
    return region_indicator(hdi_region_dict[region], 'HDI')

def region_pr(region={'na','ea','sa','wa','ca'}):
    return region_indicator(pr_region_dict[region], 'Political rights score')

def region_cl(region={'na','ea','sa','wa','ca'}):
    return region_indicator(cl_region_dict[region], 'Civil liberties score')

def region_indicators(region={'na','ea','sa','wa','ca'}):
    r_hdi = region_hdi(region)
    r_pr = region_pr(region)
    r_cl = region_cl(region)
    
    return r_hdi.join([r_pr, r_cl], how='inner').fillna(0)

#### Functions for plotting

In [503]:
def plot_conflict_distribution(conflict_distribution):
    f, ax = plt.subplots(figsize=(20,6))
    
    sns.pointplot(x=conflict_distribution.index, y='Count', hue='Type of violence', data=conflict_distribution, ax=ax)
    ax.set(title='Conflict distribution', xlabel='Year', ylabel='Count')    
    
def plot_indicators(indicators):
    f, (ax, ax3) = plt.subplots(2, figsize=(20,12))

    ax2 = ax.twinx()
    sns.pointplot(x=indicators.index, y='Political rights score', data=indicators, color='red', ax=ax)
    sns.pointplot(x=indicators.index, y='Civil liberties score', data=indicators, color='blue', ax=ax2)
    sns.pointplot(x=indicators.index, y='HDI', data=indicators, color='green', ax=ax3)

    ax.set_title('Political rights and civil liberties')
    ax3.set_title('Human development index')
    
    ax.legend(['Political rights score'], loc='upper right', bbox_to_anchor=(1, 1))
    ax2.legend(['Civil liberties score'], loc='upper right', bbox_to_anchor=(1, 0.9))
    ax3.legend(['Political rights score'], loc='upper right', bbox_to_anchor=(1, 1))
    
def plot_correlation(data):
    corr = data.corr(method='spearman')
    mask = np.zeros_like(corr)
    mask[np.triu_indices_from(mask)] = True
    
    ax = plt.axes()
    sns.heatmap(corr, mask=mask, annot=True, fmt=".2f", cmap="RdBu_r", ax=ax)
    ax.set_title('Correlation heatmap')
    
def plot_region_correlation(region={'na','ea','sa','wa','ca'}):
    r_cd = region_conflict_distribution(region)
    r_ind = region_indicators(region)
    combined = r_cd.join([r_ind], how='outer').fillna(0)    
    plot_correlation(combined)
    
def plot_country_correlation(country, region={'na','ea','sa','wa','ca'}):
    c_cd = country_conflict_distribution(country, region)
    c_ind = country_indicators(country, region)
    combined = c_cd.join([c_ind], how='outer').fillna(0)
    plot_correlation(combined)

#### Functions for plotting popups

In [538]:
from bokeh.palettes import Spectral11
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import column

def plot_indicators_popup(indicators, country_name):
    f1 = figure(width=500, height=300)
    f1.line(indicators.index, indicators['Political rights score'], legend='Political rights score', color='red')
    f1.line(indicators.index, indicators['Civil liberties score'], legend='Civil liberties score', color='blue')
    f1.toolbar.disabled = True
    f1.toolbar.logo = None
    f1.toolbar_location = None
    f1.title.text = 'Political rights and civil liberties score for ' + country_name
    f1.xaxis.axis_label = 'Year'
    f1.yaxis.axis_label = 'Score'
    f1.axis.axis_label_text_font_size = '10pt'
    f1.title.align = 'center'
    f1.title.text_font_size = '12pt'
    f1.xaxis.major_label_text_font_size = '10pt'
    f1.height = 380
    f1.width = 500
    f1.legend.location = 'bottom_right'
    f1.legend.border_line_width = 0
    f1.legend.background_fill_alpha = 0.0
    
    f2 = figure(width=500, height=300)
    f2.line(indicators.index, indicators['HDI'], color='green')
    f2.toolbar.disabled = True
    f2.toolbar.logo = None
    f2.toolbar_location = None
    f2.title.text = 'Human development index of ' + country_name
    f2.xaxis.axis_label = 'Year'
    f2.yaxis.axis_label = 'HDI'
    f2.axis.axis_label_text_font_size = '10pt'
    f2.title.align = 'center'
    f2.title.text_font_size = '12pt'
    f2.xaxis.major_label_text_font_size = '10pt'
    f2.height = 380
    f2.width = 500
    
    return column(f1, f2)

def indicators_popup_html(indicators, country_name):
    p = plot_indicators_popup(indicators, country_name)
    return file_html(p, CDN, country_name + '_indicators_popup')

def plot_conflict_distribution_popup(conflict_distribution, country_name):
    f = figure(width=500, height=300)
    
    tov1 = conflict_distribution[conflict_distribution['Type of violence'] == 1]
    tov2 = conflict_distribution[conflict_distribution['Type of violence'] == 2]
    tov3 = conflict_distribution[conflict_distribution['Type of violence'] == 3]
    
    f.line(tov1.index, tov1['Count'], legend='Type of violence 1', color='red')
    f.line(tov2.index, tov2['Count'], legend='Type of violence 2', color='blue')
    f.line(tov3.index, tov3['Count'], legend='Type of violence 3', color='green')
    
    f.toolbar.disabled = True
    f.toolbar.logo = None
    f.toolbar_location = None
    f.title.text = 'Conflict distribution by type for ' + country_name
    f.xaxis.axis_label = 'Year'
    f.yaxis.axis_label = 'Count'
    f.axis.axis_label_text_font_size = '10pt'
    f.title.align = 'center'
    f.title.text_font_size = '12pt'
    f.xaxis.major_label_text_font_size = '10pt'
    f.legend.location = 'top_right'
    f.legend.border_line_width = 0
    f.legend.background_fill_alpha = 0.0
        
    return f

def conflict_distribution_popup_html(conflict_distribution, country_name):
    p = plot_conflict_distribution_popup(conflict_distribution, country_name)
    return file_html(p, CDN, country_name + '_conflict_distribution_popup')

### North Africa

In [547]:
def map_conflicts_per_country(region={'na','ea','sa','wa','ca'}):
    center = region_center_dict[region]
    countries = region_countries_dict[region]
    africa = folium.Map(center, tiles='cartodbpositron', zoom_start=4)
    countries_topojsons = extract_countries(countries)

    for country in countries_topojsons:
        name = country['objects']['continent_Africa_subunits']['geometries'][0]['properties']['geounit']
        
        tj = folium.TopoJson(
            country,
            'objects.continent_Africa_subunits', 
            style_function = lambda feature: {
                'fillColor' : '#ff6e3a',
                'fillOpacity' : 0.8,
                'color' : '#ff5439',
                'weight' : 1
            })
        
        combined_html = conflict_distribution_popup_html(country_conflict_distribution(name, region), name)
        combined = branca.element.IFrame(html=combined_html, width=540, height=300)
        popup = folium.Popup(combined, max_width=540)

        tj.add_child(popup)
        tj.add_to(africa)
        
    return africa

africa = map_conflicts_per_country('ea')
folium.Map.save(africa, 'map-full.html')

In [548]:
IFrame(src="map-full.html",width=900, height=500)

In [481]:
def country_conflict_plot(country, region={'na','ea','sa','wa','ca'}):
    distribution = country_conflict_distribution(country, region)
    
    tov_1_line = line_plot(distribution, 'Year', 'Type of violence 1', 'Type of violence 1', 'Type of violence 1')
    tov_1_html = file_html(tov_1_line, CDN, country + 'tov_1_line')
                           
    tov_2_line = line_plot(distribution, 'Year', 'Type of violence 2', 'Type of violence 2', 'Type of violence 2')
    tov_2_html = file_html(tov_2_line, CDN, country + 'tov_2_line')
                           
    tov_3_line = line_plot(distribution, 'Year', 'Type of violence 3', 'Type of violence 3', 'Type of violence 3')
    tov_3_html = file_html(tov_3_line, CDN, country + 'tov_3_line')
    
    combined_html = '<p style="font-family: Verdana; text-align: center;"> Statistics for '+country+'</p>'\
    +'<figure>'+tov_1_html+'</figure>'\
    +'<figure>'+tov_2_html+'</figure>'\
    +'<figure>'+tov_3_html+'</figure>'
    
    return combined_html
    
#smt = country_conflict_plot('Zimbabwe', 'ea')
#display(HTML(smt))

In [482]:
def transpose_data(data, region, flag):
    n = data.count().Country
    data_t = data.transpose()
    data_t['index'] = range(0, data_t.shape[0])
    data_t['average'] = data.sum()
    data_t = data_t.set_index('index')
    
    if flag==0:
        data_t = data_t.drop([0])
    else:
        data_t = data_t.drop([0,1])
    
    data_t['year'] = range(1990, 2016)
    data_t['average'] = data_t['average']/n
    data_t['region'] = region
    return(data_t)


# Plots correlation heatmap for one African region
def plot_region_correlation(data, data_hdi, data_pr, data_cl, region_name):
    best = data[['year', 'best', 'type_of_violence']] \
        .groupby(['year', 'type_of_violence']) \
        .sum() \
        .reset_index()
    best = best[(best.year>1989) & (best.year<2016)]
    
    region = pd.DataFrame()
    region = best[best.type_of_violence==1].set_index('year').drop(['type_of_violence'], axis=1)
    region.columns = ['Type of violence 1']
    
    best2 = best[best.type_of_violence==2].set_index('year').drop(['type_of_violence'], axis=1)
    best2.columns = ['Type of violence 2']
    
    best3 = best[best.type_of_violence==3].set_index('year').drop(['type_of_violence'], axis=1)
    best3.columns = ['Type of violence 3']
    
    region_scores = pd.DataFrame()
    region_scores['HDI'] = transpose_data(data_hdi, region, 1).average
    region_scores['Political rights score'] = transpose_data(data_pr, region, 0).average
    region_scores['Civil liberties'] = transpose_data(data_cl, region, 0).average
    region_scores.index = range(1990,2016)

    region = region.join([best2, best3, region_scores], how='outer')
    region = region.fillna(0)
    corr = region.corr(method='spearman')
    mask = np.zeros_like(corr)
    mask[np.triu_indices_from(mask)] = True
    
    ax = plt.axes()
    sns.heatmap(corr, mask=mask, annot=True, fmt=".2f", cmap="RdBu_r", ax=ax)
    ax.set_title('Correlation heatmap for region: ' + region_name)

def plot_indicators(indicators):
    f, (ax, ax3) = plt.subplots(2, figsize=(20,12))

    ax2 = ax.twinx() #This allows the common axes (flow rate) to be shared
    sns.pointplot(x=indicators.index, y='Political rights score', data=indicators, color='red', ax=ax)
    sns.pointplot(x=indicators.index, y='Civil liberties score', data=indicators, color='blue', ax=ax2)
    sns.pointplot(x=indicators.index, y='HDI', data=indicators, color='green', ax=ax3)

    ax.set_title('Political rights and civil liberties')
    ax3.set_title('Human development index')
    
    ax.legend(['Political rights score'], loc='upper right', bbox_to_anchor=(1, 1))
    ax2.legend(['Civil liberties score'], loc='upper right', bbox_to_anchor=(1, 0.9))
    ax3.legend(['Political rights score'], loc='upper right', bbox_to_anchor=(1, 1))
    
#ri_na = region_indicators('na')
#plot_indicators(ri_na)



### Zimbabwe

In [483]:
zimbabwe_hdi = country_hdi(name='Zimbabwe', region='ea')
zimbabwe_pr = country_pr(name='Zimbabwe', region='ea')
zimbabwe_cr = country_cl(name='Zimbabwe', region='ea')
zimbabwe = zimbabwe_hdi.join([zimbabwe_pr, zimbabwe_cr], how='inner')

zimbabwe.corr(method='spearman')
zimbabwe.corr(method='pearson')
#zimbabwe[zimbabwe.columns[1:]].plot(kind='line')

def html_for_country(country, region={'na','ea','sa','wa','ca'}):
    hdi_line = country_hdi_plot(country, region)
    hdi_html = file_html(hdi_line, CDN, country+'_hdi_plot')
    
    pr_line = country_pr_plot(country, region)
    pr_html = file_html(pr_line, CDN, country+'_pf_plot')

    cl_line = country_cl_plot(country, region)
    cl_html = file_html(cl_line, CDN, country+'cl_plot')
    
    combined_html = '<p style="font-family: Verdana; text-align: center;"> Statistics for '+country+'</p>'\
    +'<figure>'+hdi_html+'</figure>'\
    +'<figure>'+pr_html+'</figure>'\
    +'<figure>'+cl_html+'</figure>'
    
    return combined_html

#el = branca.element.IFrame(html=html_line, width=100, height=100)
#display(el)

zimbabwe_html = html_for_country('Zimbabwe','ea')

display(HTML(zimbabwe_html))

#p = figure(plot_width=400, plot_height=400)
#p.line(zimbabwe_pr.index, zimbabwe_pr['Political rights score'], line_width=2)
#show(p)

TypeError: country_hdi() got an unexpected keyword argument 'name'

## Disclaimer

Further work presents *specification* of our pipeline that we plan to apply to every region as part of our project. Although it lacks polish and more expressive commentary, pipeline in its current state points to soundness, feasability and end goal of our project. 

Graphs presented are not yet polished, and map visualizations will be included in the final report for each region.

## Analysing data using plots

We are interested in how the conflicts impact on the development of a country and the life quality of its habitants, in order to do that we analyze our main dataframe using some plot like the one above: for each region we will look at many features, (e.g.  the number of death, the number of conflicts) and we will try to find a correlation between this data and the development index, and the indicators contained in the 'Freedom-House' dataframe. We decided to no plot any points if the values of the variable that we are analysing is equal to 0.

In [None]:
def plot_death_region(data):
    deaths_by_year = data[['year', 'best', 'type_of_violence']] \
        .groupby(['year', 'type_of_violence']) \
        .sum() \
        .reset_index()
    
    g = sns.factorplot(
        x='year', y='best', 
        hue='type_of_violence',
        data=deaths_by_year, 
        kind='point',
        size=3, aspect=4
    )

plot_death_region(df_na)
plot_death_region(df_wa)
plot_death_region(df_ca)
plot_death_region(df_ea)
plot_death_region(df_sa)

After analysing the main dataset, we have defined a function that allows us to change the shape of the other two datasets in order to plot data using Seaborn as before. For each region we compute year by year the average of the indicators that we are considering; thanks this new dataframes we can manipulate, compare and plot data in which we are interested more easily.

In [None]:
def plot_df(data):
    g = sns.factorplot(
        x='year', y='average', 
        hue='region',
        data=data, 
        kind='point',
        size=3, aspect=4
    )


def transpose_data(data,region,flag):
    n=data.count().Country
    data_t=data.transpose()
    data_t['index']=range(0, data_t.shape[0])
    data_t['average']=data.sum()
    data_t=data_t.set_index('index')
    if flag==0:
        data_t=data_t.drop(0)
    else:
        data_t=data_t.drop([0,1])
    data_t['year']=range(1990, 2016)
    data_t['average']=data_t['average']/n
    data_t['region']=region
    return(data_t)


def merge(na, wa, ea, ca, sa,flag):
    data_merged=pd.merge(transpose_data(na,'North Africa',flag),transpose_data(wa,'West Africa',flag),how='outer')
    data_merged=pd.merge(data_merged,transpose_data(ea,'East Africa',flag),how='outer')
    data_merged=pd.merge(data_merged,transpose_data(ca,'Centra Africa',flag),how='outer')
    data_merged=pd.merge(data_merged,transpose_data(sa,'South Africa',flag),how='outer')
    data_merged=data_merged.dropna(axis=1)
    return(data_merged)

political_rights_plot= merge(political_rights_na,political_rights_wa,political_rights_ea,political_rights_ca,political_rights_sa,0)
civil_liberties_plot=merge(civil_liberties_na,civil_liberties_wa,civil_liberties_ea,civil_liberties_ca,civil_liberties_sa,0)
hdi_plot=merge(hdi_na,hdi_wa,hdi_ea,hdi_ca,hdi_sa,1)

plot_df(political_rights_plot)
plot_df(civil_liberties_plot)
plot_df(hdi_plot)

Comparing the plots above, we can notice that there is a strong correlation between the political rights and the civil liberties score of each region, while human development index has a different trend.
After that we try to define a pipeline to plot on the same graph the data of the main dataset and the tendency of the indicators in order to better compare them graphically.

In [None]:
def plot_region(data,region):
    data1 = df_na[['year', 'best', 'type_of_violence']].groupby(['year', 'type_of_violence']).sum().reset_index()
    data2 = data.loc[data.region==region]
    fig, ax = plt.subplots(figsize=(20,5))
    sns.factorplot(
        x='year', y='best', 
        data=data1, ci=None,
        kind='bar', ax=ax)
    ax2 = ax.twinx()
    sns.factorplot(
        x='year', y='average', 
        data=data2,
        kind='point',
        size=3, aspect=4,ax=ax2)
    
plot_region(hdi_plot,'North Africa')
plot_region(political_rights_plot,'North Africa')
plot_region(civil_liberties_plot,'North Africa')

Using these plots, our aim is to find a correlation between the indicators and the magnitude of conflicts (e.g. in term of number of deaths or number of conflicts), moreover we decide to do this analysis region by region in order to see if there are difference between them or if they present any type of correlation, which could be both negative or positive.

### Adaptaions

When researching possible correlation between the number/impact of conflicts and the freedom of expression, we will include an approach where only conflicts where government is present as one of the waring parties, to see whether correlation strengthens (or not). We base this assumption on the fact that governments are those who have the largest capacity to impact different forms of freedom of expression.

### Visualization

Regional approach to research will be enriched with layered map visualizations.