#[PJ学習]World Hapiness Report

### 国(country)と地域(region)の対応表を作る
国と地域の対応表の元データを，以下のサイトから持ってくる。
https://statisticstimes.com/geography/countries-by-continents.php

参考にしたサイト：
https://stackoverflow.com/questions/78382398/how-to-add-a-column-on-a-pandas-dataframe-that-is-based-on-the-continent-a-*count*

ちなみに，
国連（国際連合, UN)が加盟国を次の 23地域へ割り当てている。
```
['Southern Asia', 'Northern Europe', 'Southern Europe',
       'Northern Africa', 'Polynesia', 'Middle Africa', 'Caribbean',
       'Antarctica', 'South America', 'Western Asia',
       'Australia and New Zealand', 'Western Europe', 'Eastern Europe',
       'Central America', 'Western Africa', 'Northern America',
       'Southern Africa', 'Eastern Africa', 'South-eastern Asia',
       'Eastern Asia', 'Melanesia', 'Micronesia', 'Central Asia']
```



In [None]:
import pandas as pd

# 国(country)と地域(region)の対応表を生成する関数
def make_country_region_table():
    cr_df = pd.read_html('https://statisticstimes.com/geography/countries-by-continents.php')[2]
    # cr_df.head()

    # 国名(country)と地域(region)を対応付けるテーブルを作る
    country_region_table = dict()
    for c, r, in zip(cr_df['Country or Area'], cr_df['Region 1']):
        # print("{}:{}".format(c, r))
        country_region_table[c] = r

    print(country_region_table)

    return country_region_table

In [None]:
# 国名(country)と地域(region)を対応付けるテーブルを作る
country_region_table = make_country_region_table()

{'Afghanistan': 'Southern Asia', 'Åland Islands': 'Northern Europe', 'Albania': 'Southern Europe', 'Algeria': 'Northern Africa', 'American Samoa': 'Polynesia', 'Andorra': 'Southern Europe', 'Angola': 'Middle Africa', 'Anguilla': 'Caribbean', 'Antarctica': 'Antarctica', 'Antigua and Barbuda': 'Caribbean', 'Argentina': 'South America', 'Armenia': 'Western Asia', 'Aruba': 'Caribbean', 'Australia': 'Australia and New Zealand', 'Austria': 'Western Europe', 'Azerbaijan': 'Western Asia', 'Bahamas': 'Caribbean', 'Bahrain': 'Western Asia', 'Bangladesh': 'Southern Asia', 'Barbados': 'Caribbean', 'Belarus': 'Eastern Europe', 'Belgium': 'Western Europe', 'Belize': 'Central America', 'Benin': 'Western Africa', 'Bermuda': 'Northern America', 'Bhutan': 'Southern Asia', 'Bolivia (Plurinational State of)': 'South America', 'Bonaire, Sint Eustatius and Saba': 'Caribbean', 'Bosnia and Herzegovina': 'Southern Europe', 'Botswana': 'Southern Africa', 'Bouvet Island': 'South America', 'Brazil': 'South Americ

In [None]:
'''
国連が定義した「国-地域の対応表」 country_region_table と
World Hapiness Report に含まれる国名は，その表記が異なる。
例えば，イギリスは次のようになっている。
    (国連の定義) 'United Kingdom of Great Britain and Northern Ireland'
    (World Happiness Report の定義) 'United Kingdom'
そこで，国名を「国連の定義」に寄せる変換テーブルを作る。

また，台湾(Taiwan)とコソボ(Kosovo)は国連に加盟していないため，
World Hapiness Report にはデータがあるが， country_region_table には存在しない。
そこで，これら２国を分析対象から外す

さらに，各国が所属する地域名をデータフレームへ付加する。

この関数は，上記の「データに対する前処理を行う関数」である。
'''
def preprocessing(df, country_region_table):
    # country_region_table に含まれる国名と World Hapiness Report 内の国名が異なるため，
    # country_region_table 側に寄せる変換テーブルを作る
    #
    # 左：world hapiness report の表記，右：UNの表記
    # (メモ) Taiwan は国際的に中国の扱い → 分析から外すべき
    # (メモ)Kosovo は国際連合に未加盟
    WHRtoUN = {
    'United Kingdom':'United Kingdom of Great Britain and Northern Ireland',
    'Czech Republic':'Czechia',
    'Trinidad & Tobago':'Trinidad and Tobago',
    'South Korea':'Republic of Korea',
    'Bolivia':'Bolivia (Plurinational State of)',
    'United States':'United States of America',
    'Northern Cyprus':'Cyprus',
    'Russia':'Russian Federation',
    'Moldova':'Republic of Moldova',
    'Hong Kong':'China, Hong Kong Special Administrative Region',
    'Vietnam':'Viet Nam',
    'Ivory Coast':"Côte d’Ivoire",
    'Congo (Brazzaville)':'Congo',
    'Laos':"Lao People's Democratic Republic",
    'Venezuela':'Venezuela (Bolivarian Republic of)',
    'Palestinian Territories':'State of Palestine',
    'Iran':'Iran (Islamic Republic of)',
    'Congo (Kinshasa)':'Democratic Republic of the Congo',
    'Swaziland':'Eswatini',
    'Syria':'Syrian Arab Republic',
    'Tanzania':'United Republic of Tanzania'}

    # 国連では国として認められていないため，これらの国々を分析対象から外す
    skip_countries = ["Taiwan", "Kosovo"]

    # World Hapiness Report における国名表記を「国連による表記」へ置き換える
    # df['Country or region'] = df['Country or region'].str.replace('Denmark', 'XXX')
    for k, v in WHRtoUN.items():
        df['Country or region'] = df['Country or region'].str.replace(k, v)

    # Taiwan と Kosovo を除去
    #matched_line = df['Country or region'].str.contains("Taiwan")
    for c in skip_countries:
        matched_line = df['Country or region'].str.contains(c)
        df.drop(df[matched_line].index, inplace=True)

    # region 列を加える
    def country2region(c):
        return country_region_table[c]

    for c, r in country_region_table.items():
        # print(c, r)
        df['region'] = df['Country or region'].apply(country2region)

    return df

In [None]:
# World Happiness Report のデータを読み込む
df = pd.read_csv('2019.csv')
df.head()

Unnamed: 0,Overall rank,Country or region,Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
0,1,Finland,7.769,1.34,1.587,0.986,0.596,0.153,0.393
1,2,Denmark,7.6,1.383,1.573,0.996,0.592,0.252,0.41
2,3,Norway,7.554,1.488,1.582,1.028,0.603,0.271,0.341
3,4,Iceland,7.494,1.38,1.624,1.026,0.591,0.354,0.118
4,5,Netherlands,7.488,1.396,1.522,0.999,0.557,0.322,0.298


In [None]:
df = preprocessing(df, country_region_table)

In [None]:
df

Unnamed: 0,Overall rank,Country or region,Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,region
0,1,Finland,7.769,1.340,1.587,0.986,0.596,0.153,0.393,Northern Europe
1,2,Denmark,7.600,1.383,1.573,0.996,0.592,0.252,0.410,Northern Europe
2,3,Norway,7.554,1.488,1.582,1.028,0.603,0.271,0.341,Northern Europe
3,4,Iceland,7.494,1.380,1.624,1.026,0.591,0.354,0.118,Northern Europe
4,5,Netherlands,7.488,1.396,1.522,0.999,0.557,0.322,0.298,Western Europe
...,...,...,...,...,...,...,...,...,...,...
151,152,Rwanda,3.334,0.359,0.711,0.614,0.555,0.217,0.411,Eastern Africa
152,153,United Republic of Tanzania,3.231,0.476,0.885,0.499,0.417,0.276,0.147,Eastern Africa
153,154,Afghanistan,3.203,0.350,0.517,0.361,0.000,0.158,0.025,Southern Asia
154,155,Central African Republic,3.083,0.026,0.000,0.105,0.225,0.235,0.035,Middle Africa


### ここまでの処理でデータが揃った → ここから分析

In [None]:
# 幸福度スコアが高い国と低い国をランキング形式で表示


In [None]:
# 幸福度とGDPの相関


In [None]:
# 幸福度と健康寿命の相関


In [None]:
# 幸福度とソーシャルサポートの相関


### 地域ごとにグルーピング

In [None]:
# 地域ごとにグルーピングし，地域ごとの幸福度の平均・中央値・四分位を求める


In [None]:
# 地域ごとの箱ひげ図を作る
