# Country name dictionary

We will be using a clickable map to select countries (in addition to giving users the ability to click on a country name from a list)

Since the map we're using, `mapael` uses `ISO 3166-1 alpha-2` (two letters) to identify a country, while the dataset we have uses `ISO 3166-2 alpha-3` (three letters) to identify a country, we'll need a way to convert between the two, and to translate those into readable names.

Huge thank you to `tadast` for saving us from having to figure out the mapping:

https://gist.github.com/tadast/8827699

I downloaded and renamed it: `countries_codes_and_coordinates.csv` -> `iso2_iso3.csv`

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('wyd_88_05_for_release.csv')

In [3]:
wyd = df['contcod'].unique()

In [4]:
wyd[:20]

array(['AGO', 'ALB', 'ANT', 'ARE', 'ARG', 'ARM', 'AUS', 'AUT', 'AZE',
       'BDI', 'BEL', 'BEN', 'BFA', 'BGD', 'BGD-R', 'BGD-U', 'BGR', 'BIH',
       'BLR', 'BOL'], dtype=object)

In [5]:
ISO = pd.read_csv('./countries/iso2_iso3.csv')

In [6]:
ISO.head()

Unnamed: 0,Country,Alpha-2 code,Alpha-3 code,Numeric code,Latitude (average),Longitude (average)
0,Afghanistan,"""AF""","""AFG""","""4""","""33""","""65"""
1,Albania,"""AL""","""ALB""","""8""","""41""","""20"""
2,Algeria,"""DZ""","""DZA""","""12""","""28""","""3"""
3,American Samoa,"""AS""","""ASM""","""16""","""-14.3333""","""-170"""
4,Andorra,"""AD""","""AND""","""20""","""42.5""","""1.6"""


In [7]:
ISO.columns = ['country', 'iso2', 'iso3', 'num', 'u1', 'u2']

In [8]:
ISO.head()

Unnamed: 0,country,iso2,iso3,num,u1,u2
0,Afghanistan,"""AF""","""AFG""","""4""","""33""","""65"""
1,Albania,"""AL""","""ALB""","""8""","""41""","""20"""
2,Algeria,"""DZ""","""DZA""","""12""","""28""","""3"""
3,American Samoa,"""AS""","""ASM""","""16""","""-14.3333""","""-170"""
4,Andorra,"""AD""","""AND""","""20""","""42.5""","""1.6"""


In [9]:
countries = ISO[['country', 'iso2', 'iso3', 'num']]

In [10]:
countries.head()

Unnamed: 0,country,iso2,iso3,num
0,Afghanistan,"""AF""","""AFG""","""4"""
1,Albania,"""AL""","""ALB""","""8"""
2,Algeria,"""DZ""","""DZA""","""12"""
3,American Samoa,"""AS""","""ASM""","""16"""
4,Andorra,"""AD""","""AND""","""20"""


I would like a simple dictionary that will take an `iso3` value and return a `country`

When using the map on the webpage, users will click on a country and the map library will give us `iso2`

So we'll have another dictionary that will take an `iso2` and return `iso3`

Remove all the `"` quotes as well as extra spaces!

In [11]:
for col in countries.columns[1:]:
  countries[col] = countries[col].apply(lambda x: x.replace('"', ''))
  countries[col] = countries[col].apply(lambda x: x.replace(' ', ''))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [12]:
countries.head()

Unnamed: 0,country,iso2,iso3,num
0,Afghanistan,AF,AFG,4
1,Albania,AL,ALB,8
2,Algeria,DZ,DZA,12
3,American Samoa,AS,ASM,16
4,Andorra,AD,AND,20


In [13]:
iso2_to_iso3 = pd.Series(countries['iso3'].values, index=countries['iso2']).to_dict()

In [14]:
iso2_to_iso3.get('AF')

'AFG'

In [15]:
iso3_to_name = pd.Series(countries['country'].values, index=countries['iso3']).to_dict()

In [16]:
iso3_to_name.get('DEU')

'Germany'

In [24]:
iso3_to_iso2 = pd.Series(countries['iso2'].values, index=countries['iso3']).to_dict()

In [25]:
iso3_to_iso2.get('DEU')

'DE'

Let's list all the countries not found in our dictionary:

In [17]:
for country in wyd:
  if not iso3_to_name.get(country):
    print('! NOT FOUND ! -- ', country)

! NOT FOUND ! --  BGD-R
! NOT FOUND ! --  BGD-U
! NOT FOUND ! --  CHN-R
! NOT FOUND ! --  CHN-U
! NOT FOUND ! --  DDR
! NOT FOUND ! --  ECU-U
! NOT FOUND ! --  EGY-R
! NOT FOUND ! --  EGY-U
! NOT FOUND ! --  IDN-R
! NOT FOUND ! --  IDN-U
! NOT FOUND ! --  IND-R
! NOT FOUND ! --  IND-U
! NOT FOUND ! --  MNG-R
! NOT FOUND ! --  MNG-U
! NOT FOUND ! --  NER-R
! NOT FOUND ! --  NER-U
! NOT FOUND ! --  PER-U
! NOT FOUND ! --  ROM
! NOT FOUND ! --  SLV-U
! NOT FOUND ! --  TMP
! NOT FOUND ! --  YUG
! NOT FOUND ! --  ZAR


Most of these are regular countries for which the data had been split into two `U` (urban) and `R` (rural)

The other few exceptions:
`DDR`, `ROM`, `TMP`, `YUG`, and `ZAR`

According to this: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-3

*Deleted*:    
- `DDR` - German Democratic Republic
       
*Transitional reservations*
- `TMP` - East Timor - From May 2002
- `ROM` - Romania – From February 2002; Code changed to `ROU`
- `YUG` - Yugoslavia – From July 2003
- `ZAR` - Zaire – From July 1997

In [18]:
iso3_to_name.get('ROU')

'Romania'

We'll later manually change `ROM` to `ROU` in our `wyd` dataset and just ignore the other four for convenience.

In [19]:
import json

In [20]:
output = json.dumps(iso2_to_iso3)

In [21]:
with open("./countries/iso2_to_iso3.js", "w") as text_file:
    text_file.write("var iso2_to_iso3 = " + output)

In [22]:
output = json.dumps(iso3_to_name)

In [23]:
with open("./countries/iso3_to_name.js", "w") as text_file:
    text_file.write("var iso3_to_name = " + output)

In [27]:
output = json.dumps(iso3_to_iso2)

In [28]:
with open("./countries/iso3_to_iso2.js", "w") as text_file:
    text_file.write("var iso3_to_iso2 = " + output)