# Mass Mobilization Effectiveness 

## Problem Statement:

What are the short term and long term effects of mass mobilizations/protests in countries with varying levels of freedom?  

An NGO (i.e. Amnesty International) wants to better understand the long and short term repercussions of various types of protests, in order to better determine in which occasions they should express support for protests worldwide, in what ways they should consider amplifying protests using their platform, and in order to know in which ways that can best support protesters. 

### Short term repercussions are categorized as state response to the protest. 

### Long Term metrics will include:
* Freedom Score changes over time (measured in years)
* World Governance Indicators (measured in years)



## The Data

I intend to work with data from 3 different sources:
    
1. Mass Mobilization Project - data regarding protests worldwide since 1990
    * https://massmobilization.github.io/
    


2. WorldBank: DataBank
    * https://databank.worldbank.org/source/world-development-indicators
    * https://databank.worldbank.org/source/health-nutrition-and-population-statistics-by-wealth-quintile
    * The DataBank has a plethora of data going back at minimum 2 decades and I’m still parsing out which metrics will be most useful for my purposes. Initially I have looked at the following:
        * Worldwide Governance Indicators - beginning in 1996 (every other year until 2002, then every year)
        * Poverty and Equity


3. Freedom House
    * https://freedomhouse.org/countries/freedom-world/scores
    * World Freedom Score - beginning in 2006


4. Our World in Data
    * https://ourworldindata.org/
    * Potential source of other metrics.
    
5. Kaggle - News Headlines
    * https://www.kaggle.com/therohk/million-headlines?select=abcnews-date-text.csv

I am also interested in looking at metrics related to specific types of protests. For example, there is data related to agriculture, and I want to see if there is a way for me to track the outcome of agricultural related protests. 


In [1]:
import pandas as pd
import numpy as np
import mitosheet
import plotly.express as px

In [2]:
#Mass Mobilization data
mm = pd.read_csv('./data/dataverse_files/mmALL_073120_csv.csv')

#WorldBank Worldwide Governance Indicators
wg = pd.read_csv('data/Data_Extract_From_Worldwide_Governance_Indicators/3f2d17f2-cc70-48f7-8fb9-ab3d88c6b0bd_Data.csv')

In [3]:
mm.shape

(17145, 31)

In [4]:
mm.isna().sum()

id                           0
country                      0
ccode                        0
year                         0
region                       0
protest                      0
protestnumber                0
startday                  1906
startmonth                1906
startyear                 1906
endday                    1906
endmonth                  1906
endyear                   1906
protesterviolence         1387
location                  1927
participants_category     7258
participants              1399
protesteridentity         2461
protesterdemand1          1907
protesterdemand2         14168
protesterdemand3         16762
protesterdemand4         16314
stateresponse1            1937
stateresponse2           14257
stateresponse3           16215
stateresponse4           16901
stateresponse5           16296
stateresponse6           17129
stateresponse7           16225
sources                   1910
notes                     1952
dtype: int64

In [5]:
# mm['startday'].dropna(inplace=True)

In [6]:
mm[mm['startday'].isna()]

Unnamed: 0,id,country,ccode,year,region,protest,protestnumber,startday,startmonth,startyear,...,protesterdemand4,stateresponse1,stateresponse2,stateresponse3,stateresponse4,stateresponse5,stateresponse6,stateresponse7,sources,notes
18,201998000,Canada,20,1998,North America,0,0,,,,...,,,,,,,,,,
19,201999000,Canada,20,1999,North America,0,0,,,,...,,,,,,,,,,
24,202001000,Canada,20,2001,North America,0,0,,,,...,,,,,,,,,,
25,202002000,Canada,20,2002,North America,0,0,,,,...,,,,,,,,,,
27,202004000,Canada,20,2004,North America,0,0,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17127,9102008000,Papua New Guinea,910,2008,Oceania,0,0,,,,...,,,,,,,,,,
17137,9102015000,Papua New Guinea,910,2015,Oceania,0,0,,,,...,,,,,,,,.,,
17142,9102018000,Papua New Guinea,910,2018,Oceania,0,0,,,,...,,,,,,,,,,
17143,9102019000,Papua New Guinea,910,2019,Oceania,0,0,,,,...,.,,,,,.,,,,


In [7]:
mm['country'].unique()

array(['Canada', 'Cuba', 'Haiti', 'Dominican Republic', 'Jamaica',
       'Mexico', 'Guatemala', 'Honduras', 'El Salvador', 'Nicaragua',
       'Costa Rica', 'Panama', 'Colombia', 'Venezuela', 'Guyana',
       'Suriname', 'Ecuador', 'Peru', 'Brazil', 'Bolivia', 'Paraguay',
       'Chile', 'Argentina', 'Uruguay', 'United Kingdom', 'Ireland',
       'Netherlands', 'Belgium', 'Luxembourg', 'France', 'Switzerland',
       'Spain', 'Portugal', 'Germany', 'Germany West', 'Germany East',
       'Poland', 'Austria', 'Hungary', 'Czechoslovakia', 'Czech Republic',
       'Slovak Republic', 'Italy', 'Albania', 'Kosovo', 'Serbia',
       'Macedonia', 'Croatia', 'Yugoslavia', 'Bosnia',
       'Serbia and Montenegro', 'Montenegro', 'Slovenia', 'Greece',
       'Cyprus', 'Bulgaria', 'Moldova', 'Romania', 'USSR', 'Russia',
       'Estonia', 'Latvia', 'Lithuania', 'Ukraine', 'Belarus', 'Armenia',
       'Georgia', 'Azerbaijan', 'Finland', 'Sweden', 'Norway', 'Denmark',
       'Cape Verde', 'Guinea-Biss

In [8]:
mm['protesterdemand1'].value_counts()

political behavior, process    9680
labor wage dispute             1710
price increases, tax policy    1087
removal of politician          1011
police brutality                825
land farm issue                 467
social restrictions             458
Name: protesterdemand1, dtype: int64

In [9]:
mm['protesterdemand2'].value_counts()

political behavior, process    1004
removal of politician           768
labor wage dispute              438
police brutality                241
price increases, tax policy     214
social restrictions             212
land farm issue                 100
Name: protesterdemand2, dtype: int64

In [10]:
mm['protesterdemand3'].value_counts()

price increases, tax policy    111
removal of politician           94
political behavior, process     63
labor wage dispute              61
police brutality                28
social restrictions             14
land farm issue                 12
Name: protesterdemand3, dtype: int64

In [11]:
wg.head()

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,1996 [YR1996],1998 [YR1998],2000 [YR2000],2002 [YR2002],2003 [YR2003],2004 [YR2004],...,2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020]
0,Afghanistan,AFG,Control of Corruption: Estimate,CC.EST,-1.291705,-1.180848,-1.29538,-1.263366,-1.351042,-1.345281,...,-1.579174,-1.419741,-1.43651,-1.354829,-1.342216,-1.526172,-1.515626,-1.487624,-1.400733,-1.475405
1,Afghanistan,AFG,Control of Corruption: Number of Sources,CC.NO.SRC,2.0,2.0,2.0,2.0,3.0,5.0,...,9.0,10.0,11.0,11.0,11.0,10.0,10.0,10.0,10.0,9.0
2,Afghanistan,AFG,Control of Corruption: Percentile Rank,CC.PER.RNK,4.301075,9.793815,5.076142,5.050505,5.050505,5.853659,...,0.9478673,2.369668,1.895735,5.288462,6.25,3.365385,3.846154,4.807693,6.730769,5.288462
3,Afghanistan,AFG,"Control of Corruption: Percentile Rank, Lower ...",CC.PER.RNK.LOWER,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.4807692,1.442308,0.0,0.0,0.0,1.923077,0.0
4,Afghanistan,AFG,"Control of Corruption: Percentile Rank, Upper ...",CC.PER.RNK.UPPER,27.41936,31.4433,29.44162,31.81818,18.18182,14.14634,...,5.687204,11.84834,9.952606,12.5,12.5,9.615385,9.615385,10.09615,11.53846,11.05769


In [12]:
wg.shape

(7709, 26)

In [13]:
wg.isna().sum()

Country Name     3
Country Code     5
Series Name      5
Series Code      5
1996 [YR1996]    5
1998 [YR1998]    5
2000 [YR2000]    5
2002 [YR2002]    5
2003 [YR2003]    5
2004 [YR2004]    5
2005 [YR2005]    5
2006 [YR2006]    5
2007 [YR2007]    5
2008 [YR2008]    5
2009 [YR2009]    5
2010 [YR2010]    5
2011 [YR2011]    5
2012 [YR2012]    5
2013 [YR2013]    5
2014 [YR2014]    5
2015 [YR2015]    5
2016 [YR2016]    5
2017 [YR2017]    5
2018 [YR2018]    5
2019 [YR2019]    5
2020 [YR2020]    5
dtype: int64

In [14]:
wg['Series Name'].value_counts()

Political Stability and Absence of Violence/Terrorism: Percentile Rank                                            214
Voice and Accountability: Percentile Rank, Lower Bound of 90% Confidence Interval                                 214
Regulatory Quality: Percentile Rank                                                                               214
Voice and Accountability: Number of Sources                                                                       214
Regulatory Quality: Estimate                                                                                      214
Control of Corruption: Percentile Rank                                                                            214
Rule of Law: Number of Sources                                                                                    214
Voice and Accountability: Standard Error                                                                          214
Government Effectiveness: Number of Sources             

In [15]:
pol_stab = wg[wg['Series Name'] == 'Political Stability and Absence of Violence/Terrorism: Estimate']

In [16]:
pol_stab.columns

Index(['Country Name', 'Country Code', 'Series Name', 'Series Code',
       '1996 [YR1996]', '1998 [YR1998]', '2000 [YR2000]', '2002 [YR2002]',
       '2003 [YR2003]', '2004 [YR2004]', '2005 [YR2005]', '2006 [YR2006]',
       '2007 [YR2007]', '2008 [YR2008]', '2009 [YR2009]', '2010 [YR2010]',
       '2011 [YR2011]', '2012 [YR2012]', '2013 [YR2013]', '2014 [YR2014]',
       '2015 [YR2015]', '2016 [YR2016]', '2017 [YR2017]', '2018 [YR2018]',
       '2019 [YR2019]', '2020 [YR2020]'],
      dtype='object')

In [17]:
ps_column_dict = {}

for column in pol_stab.columns:
    if column[-1] == ']':
        ps_column_dict[column] = 'political_stability_' + column[:4]
    elif column == 'Country Name':
        ps_column_dict[column] = 'country'
    else:
        ps_column_dict[column] = 'political_stability_' + column.lower().replace(' ', '_')


In [18]:
ps_column_dict

{'Country Name': 'country',
 'Country Code': 'political_stability_country_code',
 'Series Name': 'political_stability_series_name',
 'Series Code': 'political_stability_series_code',
 '1996 [YR1996]': 'political_stability_1996',
 '1998 [YR1998]': 'political_stability_1998',
 '2000 [YR2000]': 'political_stability_2000',
 '2002 [YR2002]': 'political_stability_2002',
 '2003 [YR2003]': 'political_stability_2003',
 '2004 [YR2004]': 'political_stability_2004',
 '2005 [YR2005]': 'political_stability_2005',
 '2006 [YR2006]': 'political_stability_2006',
 '2007 [YR2007]': 'political_stability_2007',
 '2008 [YR2008]': 'political_stability_2008',
 '2009 [YR2009]': 'political_stability_2009',
 '2010 [YR2010]': 'political_stability_2010',
 '2011 [YR2011]': 'political_stability_2011',
 '2012 [YR2012]': 'political_stability_2012',
 '2013 [YR2013]': 'political_stability_2013',
 '2014 [YR2014]': 'political_stability_2014',
 '2015 [YR2015]': 'political_stability_2015',
 '2016 [YR2016]': 'political_stabil

In [19]:
pol_stab = pol_stab.rename(mapper=ps_column_dict, axis=1)

In [20]:
pol_stab.columns

Index(['country', 'political_stability_country_code',
       'political_stability_series_name', 'political_stability_series_code',
       'political_stability_1996', 'political_stability_1998',
       'political_stability_2000', 'political_stability_2002',
       'political_stability_2003', 'political_stability_2004',
       'political_stability_2005', 'political_stability_2006',
       'political_stability_2007', 'political_stability_2008',
       'political_stability_2009', 'political_stability_2010',
       'political_stability_2011', 'political_stability_2012',
       'political_stability_2013', 'political_stability_2014',
       'political_stability_2015', 'political_stability_2016',
       'political_stability_2017', 'political_stability_2018',
       'political_stability_2019', 'political_stability_2020'],
      dtype='object')

In [21]:
mm.head()

Unnamed: 0,id,country,ccode,year,region,protest,protestnumber,startday,startmonth,startyear,...,protesterdemand4,stateresponse1,stateresponse2,stateresponse3,stateresponse4,stateresponse5,stateresponse6,stateresponse7,sources,notes
0,201990001,Canada,20,1990,North America,1,1,15.0,1.0,1990.0,...,,ignore,,,,,,,1. great canadian train journeys into history;...,canada s railway passenger system was finally ...
1,201990002,Canada,20,1990,North America,1,2,25.0,6.0,1990.0,...,,ignore,,,,,,,1. autonomy s cry revived in quebec the new yo...,protestors were only identified as young peopl...
2,201990003,Canada,20,1990,North America,1,3,1.0,7.0,1990.0,...,,ignore,,,,,,,1. quebec protest after queen calls for unity ...,"the queen, after calling on canadians to remai..."
3,201990004,Canada,20,1990,North America,1,4,12.0,7.0,1990.0,...,,accomodation,,,,,,,1. indians gather as siege intensifies; armed ...,canada s federal government has agreed to acqu...
4,201990005,Canada,20,1990,North America,1,5,14.0,8.0,1990.0,...,,crowd dispersal,arrests,accomodation,,,,,1. dozens hurt in mohawk blockade protest the ...,protests were directed against the state due t...


In [22]:
df = pd.merge(mm, pol_stab, on="country")

In [23]:
df.columns

Index(['id', 'country', 'ccode', 'year', 'region', 'protest', 'protestnumber',
       'startday', 'startmonth', 'startyear', 'endday', 'endmonth', 'endyear',
       'protesterviolence', 'location', 'participants_category',
       'participants', 'protesteridentity', 'protesterdemand1',
       'protesterdemand2', 'protesterdemand3', 'protesterdemand4',
       'stateresponse1', 'stateresponse2', 'stateresponse3', 'stateresponse4',
       'stateresponse5', 'stateresponse6', 'stateresponse7', 'sources',
       'notes', 'political_stability_country_code',
       'political_stability_series_name', 'political_stability_series_code',
       'political_stability_1996', 'political_stability_1998',
       'political_stability_2000', 'political_stability_2002',
       'political_stability_2003', 'political_stability_2004',
       'political_stability_2005', 'political_stability_2006',
       'political_stability_2007', 'political_stability_2008',
       'political_stability_2009', 'political_stabi

In [24]:
df

Unnamed: 0,id,country,ccode,year,region,protest,protestnumber,startday,startmonth,startyear,...,political_stability_2011,political_stability_2012,political_stability_2013,political_stability_2014,political_stability_2015,political_stability_2016,political_stability_2017,political_stability_2018,political_stability_2019,political_stability_2020
0,201990001,Canada,20,1990,North America,1,1,15.0,1.0,1990.0,...,1.077176,1.113016,1.061422,1.175504,1.274698,1.256059,1.102063,0.9791259,1.016768,1.109246
1,201990002,Canada,20,1990,North America,1,2,25.0,6.0,1990.0,...,1.077176,1.113016,1.061422,1.175504,1.274698,1.256059,1.102063,0.9791259,1.016768,1.109246
2,201990003,Canada,20,1990,North America,1,3,1.0,7.0,1990.0,...,1.077176,1.113016,1.061422,1.175504,1.274698,1.256059,1.102063,0.9791259,1.016768,1.109246
3,201990004,Canada,20,1990,North America,1,4,12.0,7.0,1990.0,...,1.077176,1.113016,1.061422,1.175504,1.274698,1.256059,1.102063,0.9791259,1.016768,1.109246
4,201990005,Canada,20,1990,North America,1,5,14.0,8.0,1990.0,...,1.077176,1.113016,1.061422,1.175504,1.274698,1.256059,1.102063,0.9791259,1.016768,1.109246
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14551,9102017002,Papua New Guinea,910,2017,Oceania,1,2,15.0,7.0,2017.0,...,-0.7546774,-0.5904424,-0.5202469,-0.3393671,-0.4221483,-0.5047721,-0.6978666,-0.6772177,-0.6973711,-0.7393168
14552,9102017003,Papua New Guinea,910,2017,Oceania,1,3,31.0,10.0,2017.0,...,-0.7546774,-0.5904424,-0.5202469,-0.3393671,-0.4221483,-0.5047721,-0.6978666,-0.6772177,-0.6973711,-0.7393168
14553,9102018000,Papua New Guinea,910,2018,Oceania,0,0,,,,...,-0.7546774,-0.5904424,-0.5202469,-0.3393671,-0.4221483,-0.5047721,-0.6978666,-0.6772177,-0.6973711,-0.7393168
14554,9102019000,Papua New Guinea,910,2019,Oceania,0,0,,,,...,-0.7546774,-0.5904424,-0.5202469,-0.3393671,-0.4221483,-0.5047721,-0.6978666,-0.6772177,-0.6973711,-0.7393168


In [25]:
read_file = pd.read_excel('data/Aggregate_Category_and_Subcategory_Scores_FIW_2003-2021.xlsx', sheet_name='FIW06-21')
read_file.to_csv('data/fiw_agg.csv', index = None, header=True)

In [26]:
fiw = pd.read_csv('data/fiw_data/fiw_agg.csv')

In [27]:
fiw['Edition'].unique()

array([2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011,
       2010, 2009, 2008, 2007, 2006])

In [28]:
fiw.columns

Index(['Country/Territory', 'Region', 'C/T?', 'Edition', 'Status', 'PR Rating',
       'CL Rating', 'A', 'B', 'C', 'Add Q', 'Add A', 'PR', 'D', 'E', 'F', 'G',
       'CL', 'Total', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21',
       'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25',
       'Unnamed: 26', 'Unnamed: 27', 'Unnamed: 28', 'Unnamed: 29',
       'Unnamed: 30', 'Unnamed: 31', 'Unnamed: 32', 'Unnamed: 33',
       'Unnamed: 34', 'x'],
      dtype='object')

In [29]:
for column in fiw.columns:
#     to_drop = []
    if column[0] == 'U':
        to_drop.append(column)
        fiw.drop(columns = [column], inplace=True)

NameError: name 'to_drop' is not defined

In [None]:
fiw.columns

In [None]:
fiw.drop(columns=['x'], inplace=True)

In [None]:
fiw_column_dict = {}

for column in fiw.columns:
    if column == 'Country/Territory':
        fiw_column_dict[column] = 'country'
    elif column == 'C/T?':
        fiw_column_dict[column] = 'country_territory'
    else:
        fiw_column_dict[column] = 'fiw_' + column.lower().replace(' ', '_')
print(fiw_column_dict)

In [None]:
fiw.rename(mapper=fiw_column_dict, axis=1, inplace=True)

In [None]:
country_map = {
    'Slovakia': 'Slovak Republic',
    'Serbia and Montenegro': 'Serbia'   
}

for k, v in country_map.items():
    for country in fiw['country']:
        if country == k:
            country = v

            
fiw.loc[fiw.country == "Slovakia"] = "Slovak Republic"           

    

In [None]:
fiw_countries = fiw['country'].unique()
df_countries = df['country'].unique()

In [None]:
for countries in df_countries:
    if countries not in fiw_countries:
        print(countries)

In [None]:
fiw.rename({'fiw_edition': 'year'}, axis=1, inplace=True)

In [None]:
fiw.isna().sum()

In [None]:
fiw.drop(columns=['fiw_add_a', 'fiw_add_q', 'country_territory'], inplace=True)

In [None]:
fiw.isna().sum()

In [None]:
fiw['year'].value_counts()

In [None]:
df

In [None]:
for row in fiw['year'].unique():
    new_df = pd.DataFrame(fiw[fiw['year'] == row])
    fiw_column_dict = {}

    for column in new_df.columns:
        if column == 'country':
            pass
        else:
            fiw_column_dict[column] = 'fiw_' + column.lower().replace(' ', '_')
    new_df.to_csv(f'data/fiw_data/fiw_{row}.csv', index = None, header=True)


In [None]:
fiw_2006 = pd.read_csv('data/fiw_data/fiw_2006.csv')

In [None]:
df

In [None]:
pd.MultiIndex.from_frame(df)


In [None]:
pd.MultiIndex(df, names=['country', 'year'])

In [None]:
df.join(fiw_2006, how='inner', on=("country","year"))

In [None]:
pd.merge(df, fiw_2006, on="country").columns

In [None]:
fiw_edition