# Look at different country data GDP, military spending etc,

We have found two data sets one from Kaggle about military spending per country for the last 60 years, and then a general set of data about each country (population, surface area etc).  Lets focus on what we can get out of the military spending data, using the other more general data as a way to generate some more interesting features

Ideas for questions to answer:
1. Look at the usual comparisons, relating GDP, GNI to military spending.
2. Is there anything significant about when countries start military spending?
2. Can we find any dramatic changes in military spending of any country and relate it to a real world event?
3. Compare Foreign direct investment to military spending

In [37]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import pycountry_convert as pc
import pycountry
%matplotlib inline

In [106]:
ms_df = pd.read_csv('Military Expenditure.csv')
wb_df = pd.read_csv('popular_indicators.csv')

Lets have a look at the data

In [107]:
ms_df.head()

Unnamed: 0,Name,Code,Type,Indicator Name,1960,1961,1962,1963,1964,1965,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Aruba,ABW,Country,Military expenditure (current USD),,,,,,,...,,,,,,,,,,
1,Afghanistan,AFG,Country,Military expenditure (current USD),,,,,,,...,251869500.0,298146900.0,325807000.0,238583400.0,217194100.0,268227100.0,199518600.0,185878300.0,191407100.0,198086300.0
2,Angola,AGO,Country,Military expenditure (current USD),,,,,,,...,3311193000.0,3500795000.0,3639496000.0,4144635000.0,6090752000.0,6841864000.0,3608299000.0,2764055000.0,3062873000.0,1983614000.0
3,Albania,ALB,Country,Military expenditure (current USD),,,,,,,...,182736900.0,185893200.0,197006800.0,183204700.0,180015500.0,178120400.0,132350700.0,130853200.0,144382700.0,180488700.0
4,Andorra,AND,Country,Military expenditure (current USD),,,,,,,...,,,,,,,,,,


In [108]:
ms_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 264 entries, 0 to 263
Data columns (total 63 columns):
Name              264 non-null object
Code              264 non-null object
Type              264 non-null object
Indicator Name    264 non-null object
1960              79 non-null float64
1961              84 non-null float64
1962              93 non-null float64
1963              98 non-null float64
1964              98 non-null float64
1965              104 non-null float64
1966              104 non-null float64
1967              105 non-null float64
1968              113 non-null float64
1969              113 non-null float64
1970              121 non-null float64
1971              122 non-null float64
1972              123 non-null float64
1973              130 non-null float64
1974              128 non-null float64
1975              128 non-null float64
1976              132 non-null float64
1977              137 non-null float64
1978              136 non-null float64
1979   

In [109]:
ms_df['Type'].unique()

array(['Country', 'Regions Clubbed Geographically',
       'Semi Autonomous Region', 'Regions Clubbed Economically'],
      dtype=object)

In [110]:
ms_df.loc[ms_df['Type'].isin(['Regions Clubbed Geographically', 'Regions Clubbed Economically',
                             'Semi Autonomous Region'])]

Unnamed: 0,Name,Code,Type,Indicator Name,1960,1961,1962,1963,1964,1965,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
5,Arab World,ARB,Regions Clubbed Geographically,Military expenditure (current USD),,,8.332295e+08,9.164461e+08,1.225339e+09,1.416443e+09,...,9.150404e+10,9.797637e+10,1.080000e+11,1.220000e+11,1.430000e+11,1.550000e+11,1.380000e+11,1.120000e+11,1.190000e+11,1.130000e+11
9,American Samoa,ASM,Semi Autonomous Region,Military expenditure (current USD),,,,,,,...,,,,,,,,,,
34,Central Europe and the Baltics,CEB,Regions Clubbed Geographically,Military expenditure (current USD),,,,,,,...,1.976783e+10,1.946245e+10,2.047837e+10,1.898036e+10,1.960601e+10,2.073030e+10,1.972430e+10,1.950156e+10,2.202378e+10,2.668120e+10
36,Channel Islands,CHI,Semi Autonomous Region,Military expenditure (current USD),,,,,,,...,,,,,,,,,,
47,Caribbean small states,CSS,Regions Clubbed Geographically,Military expenditure (current USD),,,,,,,...,3.016552e+08,3.050468e+08,3.275523e+08,3.551597e+08,3.925497e+08,3.693935e+08,3.921265e+08,4.272007e+08,4.294942e+08,4.599977e+08
50,Cayman Islands,CYM,Semi Autonomous Region,Military expenditure (current USD),,,,,,,...,,,,,,,,,,
59,East Asia & Pacific (excluding high income),EAP,Regions Clubbed Economically,Military expenditure (current USD),,,,,,,...,1.230000e+11,1.350000e+11,1.600000e+11,1.840000e+11,2.090000e+11,2.290000e+11,2.430000e+11,2.460000e+11,2.580000e+11,2.800000e+11
60,Early-demographic dividend,EAR,Regions Clubbed Economically,Military expenditure (current USD),2.716315e+09,2.557648e+09,3.651600e+09,4.584969e+09,5.243449e+09,5.933334e+09,...,1.730000e+11,1.940000e+11,2.110000e+11,2.280000e+11,2.430000e+11,2.550000e+11,2.550000e+11,2.350000e+11,2.540000e+11,2.530000e+11
61,East Asia & Pacific,EAS,Regions Clubbed Economically,Military expenditure (current USD),,,,,,,...,2.360000e+11,2.600000e+11,3.000000e+11,3.240000e+11,3.390000e+11,3.610000e+11,3.670000e+11,3.780000e+11,3.930000e+11,4.200000e+11
62,Europe & Central Asia (excluding high income),ECA,Regions Clubbed Economically,Military expenditure (current USD),,,,,,,...,8.071044e+10,8.963528e+10,1.030000e+11,1.150000e+11,1.240000e+11,1.190000e+11,9.707952e+10,9.930362e+10,9.824528e+10,9.749780e+10


We have rows for few different collection of countries, these are not especially useful at the moment as it isn't clear which countries make up these regions.  We will drop these but maybe add in some of the detail later (e.g. continents) as a feature in each column.  

We are adding continents back  in further down, there are some codes which are difficult to map to a continent, so have been removed also.  None of the countries are significant for now

In [124]:
ms_df = ms_df[ms_df['Type']=='Country']
ms_df = ms_df[~ms_df['Code'].isin(['PSS','TLS'])]

In [118]:
wb_df.head()

Unnamed: 0,Series Name,Series Code,Country Name,Country Code,1970 [YR1970],1971 [YR1971],1972 [YR1972],1973 [YR1973],1974 [YR1974],1975 [YR1975],...,2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019]
0,"Population, total",SP.POP.TOTL,Afghanistan,AFG,11173642,11475445,11791215,12108963,12412950,12689160,...,29185507,30117413,31161376,32269589,33370794,34413603,35383128,36296400,37172386,..
1,"Population, total",SP.POP.TOTL,Albania,ALB,2135479,2187853,2243126,2296752,2350124,2404831,...,2913021,2905195,2900401,2895092,2889104,2880703,2876101,2873457,2866376,..
2,"Population, total",SP.POP.TOTL,Algeria,DZA,14464985,14872250,15285990,15709825,16149025,16607707,...,35977455,36661444,37383887,38140132,38923687,39728025,40551404,41389198,42228429,..
3,"Population, total",SP.POP.TOTL,American Samoa,ASM,27363,27984,28567,29100,29596,30052,...,56079,55759,55667,55713,55791,55812,55741,55620,55465,..
4,"Population, total",SP.POP.TOTL,Andorra,AND,24276,25559,26892,28232,29520,30705,...,84449,83747,82427,80774,79213,78011,77297,77001,77006,..


In [119]:
wb_df['Series Name'].unique()

array(['Population, total', 'Population growth (annual %)',
       'Surface area (sq. km)',
       'Poverty headcount ratio at national poverty lines (% of population)',
       'GNI, Atlas method (current US$)',
       'GNI per capita, Atlas method (current US$)',
       'GNI, PPP (current international $)',
       'GNI per capita, PPP (current international $)',
       'Income share held by lowest 20%',
       'Life expectancy at birth, total (years)',
       'Fertility rate, total (births per woman)',
       'Adolescent fertility rate (births per 1,000 women ages 15-19)',
       'Contraceptive prevalence, any methods (% of women ages 15-49)',
       'Births attended by skilled health staff (% of total)',
       'Mortality rate, under-5 (per 1,000 live births)',
       'Prevalence of underweight, weight for age (% of children under 5)',
       'Immunization, measles (% of children ages 12-23 months)',
       'Primary completion rate, total (% of relevant age group)',
       'School en

In [120]:
wb_df.loc[wb_df['Series Name']=='Foreign direct investment, net inflows (BoP, current US$)']

Unnamed: 0,Series Name,Series Code,Country Name,Country Code,1970 [YR1970],1971 [YR1971],1972 [YR1972],1973 [YR1973],1974 [YR1974],1975 [YR1975],...,2010 [YR2010],2011 [YR2011],2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019]
9982,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Afghanistan,AFG,230000,450000,150000,270000,..,..,...,190774431.98,52173421,56823660,48311346,42975262.5,169146608,93591315.3,51533896.765,139200000,..
9983,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Albania,ALB,..,..,..,..,..,..,...,1090111754.87848,1048087029.34916,918313370.724312,1254274472.0354,1149927985.77737,989578334.828609,1044389554.85795,1022757857.07377,1207045718.88407,..
9984,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Algeria,DZA,80120000,600000,41490000,51000000,358000000,119000000,...,2300369124.15828,2571237024.68517,1500402452.8635,1691886707.50796,1502206170.55838,-537792920.921856,1638263953.77737,1200965279.93224,1506316885.7744,..
9985,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,American Samoa,ASM,..,..,..,..,..,..,...,..,..,..,..,..,..,..,..,..,..
9986,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Andorra,AND,..,..,..,..,..,..,...,..,..,..,..,..,..,..,..,..,..
9987,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Angola,AGO,2400000,1860000,2160000,7540000,6690000,50000,...,-3227211182.45,-3023770965.83688,-1464627990.88284,-7120017424.4614,3657514667.49327,10028215162.6394,-179517618.92,-7397295409.18991,-5732491335.28017,..
9988,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Antigua and Barbuda,ATG,..,..,..,..,..,..,...,96679207.7777778,65160596.2962963,129367138.888889,134288999.62963,46061586.8270648,107460503.54774,80641321.8328381,112936813.233046,116493703.7037,..
9989,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Argentina,ARG,89770000,125670000,71720000,100250000,17920000,55590000,...,11332718626.4345,10839930944.6815,15323933916.8241,9821661858.15874,5065335541.96486,11758994011.286,3260164341.77393,11516861462.2845,11872856662.7649,..
9990,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Armenia,ARM,..,..,..,..,..,..,...,529321391.641959,653219756.47225,496636701.129687,346092394.393256,406578406.693112,184127986.314895,333733314.104601,250935110.239139,254146163.557324,..
9991,"Foreign direct investment, net inflows (BoP, c...",BX.KLT.DINV.CD.WD,Aruba,ABW,..,..,..,..,..,..,...,186759776.536313,488156424.581006,-314692737.430168,226371388.826816,250618094.972067,-28775856.424581,20982174.301676,162458100.5587,135642458.1006,..


## Data wraggling and initial exploration

Get the continent for each country

In [142]:
def continent_from_country_code(row):
    """Returns the continent a country is a part of 
    from its 2 or 3 letter country code
    
    row - Dataframe row"""
    
    cc = row['Code']
    try:
        if len(cc) == 2:
            continent_code =  pc.country_alpha2_to_continent_code(cc)
        elif len(cc) == 3:
            cc2 = pycountry.countries.get(alpha_3=cc).alpha_2
            continent_code = pc.country_alpha2_to_continent_code(cc2)
    except AttributeError:
        # some of the codes in the database can't be found so we will do a fuzzy search as a back up
        fuz_search = pycountry.countries.search_fuzzy(row['Name'])[0].alpha_2
        continent_code = pc.country_alpha2_to_continent_code(fuz_search)
    
    continent_dict = {'NA':'North America', 'EU': 'Europe','AS':'Asia',
                     'AF':'Africa', 'SA':'South America','OC':'Oceania'}
    return continent_dict[continent_code]

In [143]:
ms_df['continent'] = ms_df.apply(continent_from_country_code, axis=1)

In [102]:
dir(pycountry.countries)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_is_loaded',
 '_load',
 '_load_lock',
 'data_class',
 'data_class_base',
 'data_class_name',
 'filename',
 'get',
 'index_names',
 'indices',
 'lookup',
 'no_index',
 'objects',
 'root_key',
 'search_fuzzy']