# A History of Dual Citizenships

To what extent has countries allowed dual citizenships and how has it changed? What unique characteristics can we glean from the countries allowing or disallowing dual citizenship schemes?   
  
The MACIMIDE Global Expatriate Dual Citizenship Dataset compiled the dual citizenship rules that existed in near all sovereign states of the world for the past half a centry. It contains country names, 3-letter ISO codes and describes 3 policy frameworks: 
* No dual citizenship: acquicing another citizenship leads to automatical renounciation of the original citizenship
* Not automacally renounced: One gets to keep original citizenship, but also have the possibility to voluntarily renounce their citizenship of origin.
* Dual citizenship: One gets to keep original citizenship, and couldn't renounce it.

In [None]:
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns
from ipywidgets import interact, interactive
plt.style.use('fivethirtyeight')
%matplotlib inline

## Pre-processing

In [None]:
df_raw = pd.read_csv("multiple_citizenship_data/dual_citizenship.csv")
print(df_raw.shape)
df_raw.head()

In [None]:
df = df_raw
df['region']=df.world_region.replace(1.0,'Africa') \
    .replace(2.0,'Asia') \
    .replace(3.0,'Europe') \
    .replace(4.0,'LTAM') \
    .replace(5.0,'North America') \
    .replace(6.0,'Oceania') 

In [None]:
def category (code):
   #based on data codebook
    if (code in [110,111,112]):
        return 'No dual citizenship'
    elif code in [210,211,212,220]:
        return 'Not automatically renounced'
    elif code in [310,320,330]:
        return 'Dual citizenship'
    else:
        return 'Other'
    
df['category']=df_raw['Dualcit_cat'].map(category)

In [None]:
#number of unique countries by continents
print(df.groupby('region')['country'].nunique())

#example countries
unique_cntry = df.groupby('region')['country'].unique()
unique_cntry

## Data cleaning

In [None]:
#remove countries with no info
df=df.query('category!="Other"')

For the countries missing dual citizenship data, we'll remove them as they add no extra information, and there is no way to impute them besides manually gathering info which is beyond the scope of this analysis.

In [None]:
#check for missing value
df.isnull().sum()

In [None]:
# see which countries are missing region
cnty_wo_region=df.loc[pd.isna(df.region),'country'].unique()
cnty_wo_region

These countries no longer exist today. Let's filter them out.

In [None]:
df=df[~df['country'].isin(cnty_wo_region)]

#final countries remained
df['country'].nunique()

In [None]:
#years covered
np.min(df.Year),np.max(df.Year)

We have quite a comprehensive list of 195 countries spanning from 1960 to 2018, all having information on whether they allow dual citizenship, after removing the 5 countries missing region definition since they no longer exist. 

Some were missing ISO2 code or dependency code describing their subdivision, which are of no importance to this analysis.

Also worth noting, some organizations such as Quartz.com reports the 'no automatic renounciation' grouped under 'dual citizenship allowed'. Strickly speaking, it shouldn't be. This is a broad and grey category best treated on its own. For example, Canada and Singapore are both in this category, but Canadians can process multiple passports while in Singapore it is only possible up to 18 years old and by constitution dual citizenship is not allowed. For the accuracy of the analysis, we'll keep the policy in 3 categories instead of 2 categories which would otherwise result in misleadingly high numbers.   
There are also situations where a country only allow dual citizenships with certain other countries which is beyond this dataset.   
In addition, it might also be useful to alert to the fact that the dataset was a manual collection based on many different legislative documents and data quality cannot be inspected solely by looking at distributions.

## Latest state of the policy

In [None]:
### Percentage of countries allowing dual citizenship
df["value"]=1
df2018=df[df['Year']==2018]
df_smry=df2018.pivot_table(index='region',columns='category',values="value",aggfunc='count', fill_value=0, margins=True)
df_smry

In [None]:
import geopandas as gpd
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

#select only the relevant columns
cols = ['country', 'ISO3', 'longitude','latitude','category','Year']
df_s=df2018[cols]
df_s.head()

#join world to data
merged18=world.merge(df_s, left_on='iso_a3',right_on='ISO3',how='inner')

## The history of the policy

### Percentage of countries by category globally

In [None]:
df_smry = df.pivot_table(index=['Year'],columns='category', \
               values="value",aggfunc='count', fill_value=0, margins=1)
df_smry_pct = df_smry.div(df_smry["All"], axis='index')
df_smry_pct.iloc[:-1,:-1].plot()
plt.legend(loc='lower left')
plt.title('Overall % of dual citizenship countries by year')

### Percentage of countries strickly forbid dual citizenship by region

In [None]:
df_smry_yr = df.pivot_table(index=['Year','region'],columns='category', \
               values="value",aggfunc='count', fill_value=0, margins=1)
df_smry_yr_pct = df_smry_yr.div(df_smry_yr["All"], axis='index')
#Countries with strickly no dual citizenship by region
df_region = df_smry_yr_pct.reset_index().iloc[:-1,[0,1,3]]
#small multiples
g=sns.relplot(x='Year', y= 'No dual citizenship', hue="region", col="region", \
            kind="line", col_wrap=3,data=df_region, legend='brief')
g.fig.suptitle('Percentage of countries disallowing dual citizenship', x=0.4, y=1.05,size=16)

### Number of countries by category in each region

In [None]:
regional_agg = df.loc[:,['region','Year','category']].groupby(['region','Year','category']).size(). \
to_frame('count').reset_index()

In [None]:
g=sns.relplot(x='Year', y= 'count', hue="category", col="region", kind="line", col_wrap=3,\
            data=regional_agg, legend='brief')
g.fig.suptitle('Number of countries by category', x=0.4, y=1.05,size=16)

Percentage of countries forbiding dual citizenship are on the decline across all regions, though much more gradual in Asia.

Unsuspectingly, Asia has the lowest percentage of countries allowing dual citizenship, even lower than Africa. The 2 countries in North America both allow it and more than 80% of countries in Latin America/Oceania allow it too.

In [None]:
from matplotlib.colors import ListedColormap
cmap = ListedColormap(["#30a2da", \
    "#fc4f30", \
    "#e5ae38"], name='prime')

In [None]:
#plot on map
#1960
fig, [ax1, ax2] = plt.subplots(nrows=1, ncols=2,figsize=(16, 4))
#fig, ax = plt.subplots(1, figsize=(8, 6))
ax1.axis('off')
merged=world.merge(df[df['Year']==1960][cols], left_on='iso_a3',right_on='ISO3',how='inner')
merged.head()
lgnd_kwds = {'loc': 'upper left', 'ncol': 3}

merged.plot(column='category',legend=True,ax=ax1,cmap=cmap, legend_kwds = lgnd_kwds)
ax1.set_title('1960')


#2018
ax2.axis('off')
#join world to data
merged18.plot(column='category',legend=True,ax=ax2,cmap=cmap, legend_kwds = lgnd_kwds)
ax2.set_title('2018')

## Interactive map

In [None]:
plt.figure(figsize=(20,10)) 
def plotmap(year):
    plt.figure(figsize=(20,10)) 
    df_select=df[df['Year']==year]
    merged=world.merge(df_select[cols], left_on='iso_a3',right_on='ISO3',how='inner')
    merged.plot(column='category',legend=True, cmap=cmap, legend_kwds = lgnd_kwds)

    plt.show()
    
interactive_plot = interactive(plotmap, year=(1960, 2018))
output = interactive_plot.children[-1]
interactive_plot

We can see South America had a long history of allowing dual citizenships. Till today, most of Asian Pacific countries forbid it.

The data doesn't contain info about which country pairs co-occur in dual citizenship schemes, using which we would be able to glimpse into bilateral ties, diaspora and historical liasions. 

Neverthess we can make some observation on what potentially differs countries granting vs not granting bilateral ties, using social-economic data.

Next we mash the original dataset with another dataset from World Government Summit containing various demographic, socialeconomic and governance indicators

In [None]:
indicators = pd.read_csv("multiple_citizenship_data/gov_metrics.csv")
print(indicators.shape)
indicators.head()

In [None]:
#join with dual citizenship data
merged_indicator=df2018.merge(indicators, left_on='ISO3',right_on='ISO Country code',how='inner')
merged_indicator.head()

In [None]:
merged_indicator.groupby('category')['country'].nunique()

In [None]:
#remove column with more than half of the indicators missing
merged_indicator=merged_indicator.loc[:,merged_indicator.isnull().mean()<0.5]

In [None]:
#clean col names
merged_indicator.columns = [c.replace(' ', '_') for c in merged_indicator.columns]
#convert all col to numeric
idx_start=list(merged_indicator.columns).index('GINI_index')
idx_end=len(merged_indicator.columns)
col_numeric = merged_indicator.columns[idx_start:idx_end]
merged_indicator[col_numeric] = merged_indicator[col_numeric].apply(pd.to_numeric, errors='coerce')
#combine the numeric col with citizenship category
df_concat = pd.concat([merged_indicator['category'],merged_indicator[col_numeric]], axis=1)
df_long = pd.melt(df_concat, id_vars='category', value_vars=col_numeric).dropna()
df_concat.head()

In [None]:
select=['overall_economic_freedom_score','GINI_index','world_happiness_report_score']
dfs=df_long.loc[df_long.variable.isin(select)]

In [None]:
g=sns.catplot(x='category', y='value', col='variable', kind='box', data=dfs, sharey=False)
g.fig.suptitle('Social economic indicators of countries allowing/disallowing dual citizenship',y=1.05, size=16)

Based on these limited stats, countries allowing dual citizenship has higher happiness, overall economic freedom and Gini index (wealth inequality). More evidence is needed to figure out why these might be the case.

## Sources

Data source (extracted on Jan 10 2019): https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/TTMZ08/NZM6Y4&version=3.0

Code: Hannah Yan (hy151)