# Checkpoint 2: UNESCO Atlas of the World's Languages in Danger
<em>Vikram Ramavarapu</em>

For this checkpoint, we will investigate The United Nations Educational, Scientific and Cultural Organization's (UNESCO) Atlas of the World's Languages in Danger. This is a dataset containing languages that are severely decreasing in speakers. An example of one of these languages would be the Ainu language of Hokkaido, Japan which only has two reported speakers remaining. The question we intend to answer is which part of the world holds the most of these languages and why?

## Hypothesis

There a correlation between location and number of endangered languages. Due to colonialism and major cultural shifts, I expect to see more endangered languages in countries in Africa or South/Southeast Asia.

## Loading the Dataset

To avoid confounds, we will only select languages marked <em>Critically Endangered</em>.

In [1]:
import pandas as pd
import cartopy.crs as ccrs
import cartopy.io.shapereader as shpreader

data = pd.read_excel('sources/unesco_atlas_languages_limited_dataset.xls')
data = data.loc[data['Degree of endangerment'] == 'Critically endangered']
print(data)

*** No CODEPAGE record, no encoding_override: will use 'iso-8859-1'
ERROR *** XF[17] unknown format key (2, 0x0002)
ERROR *** XF[18] unknown format key (1, 0x0001)
        ID Name in English Name in French Name in Spanish  \
1       78         'Ongota         ongota          birale   
6     2655           Abaga          abaga           abaga   
16     829        Achumawi       achumawi        achumawi   
26     144           Ahtna          ahtna            atna   
28     511           Aimol          aimol           aimol   
...    ...             ...            ...             ...   
2706  2180        Zenatiya       zenatiya        zenatiya   
2708  2177         Zidgali        zidgali         zidgali   
2712    45          Zumaya         zumaya          zumaya   
2716  1406          |Xaise         |xaise          |xaise   
2723  1401           ÇHoa          Çhoa      |hua-owani   

                     Countries Country codes alpha 3 ISO639-3 codes  \
1                     Ethiopia  

## Formatting the Data

Now we want to get a country heatmap of the number of endangered languages worldwide. First we must count and group by languages.

In [2]:
data = data.groupby(['Country codes alpha 3']).size()

''' 
Some entries in the xls will have languages that belong to multiple countries.
This means that the country code field will be formatted as such: ABC, XYZ

We must fix this by just splitting it into each country code so that a heatmap can be generated.
'''

for key in list(data.index):
    if ',' in key:
        # Split the key, and add to each country

        curr = key.split(', ')
        for country in curr:
            if country not in data.index:
                data[country] = data[key]
            else:
                data[country] += data[key]

# Drop each entry with comma separated countries
drops = []
for key in list(data.index):
    if ',' in key:
        drops.append(key)

data = data.drop(drops)

print(data)

Country codes alpha 3
AFG     2
ARG     2
AUS    42
BEN     1
BGR     1
       ..
VUT    22
ZAF     3
ZAI     2
IRQ     1
NOR     1
Length: 81, dtype: int64


## Generating the Heatmap

Now we will use Cartopy to take this formatted data and draw a heatmap.