# Countries statistics and energy use statistics

This dataset contains the statistics and energy use of countries around the world, and each countries region according to multiple definitons. The following notebook will outline the entire creation of the dataset, going from the creation of the Entities dataset, to the creation of the info dataset, to the final creation of the dataset.

## Entities Dataset

### Introduction

This dataset contains a comprehensive list of all countries, entities, each entity's ISO code and group classification. It serves as a conversion of ISO codes between country codes used in [Our World in Data (OWID)](https://ourworldindata.org/)'s datasets, country codes used in [World Bank (WB)'s World Development Indicators (WDI)](https://data.worldbank.org/), and country codes used in the [Internation Monetary Fund (IMF)'s World Economic Outlook (WEO)](https://www.imf.org/en/Publications/WEO/weo-database/2022/April).

The dataset is available in the [entities.csv](/entities.csv) file. We used the [process.py](/process.py) script to collect and process the data.

#### Available entities

The entities listed in this dataset consists of entities available in OWID's datasets, WB's WDI, and WEO. For OWID datasets, we select countries from OWID's [Standard entity names](https://github.com/owid/energy-data/tree/master/scripts/input/shared), [World map region definitions](https://ourworldindata.org/world-region-map-definitions), [Energy Dataset](https://github.com/owid/energy-data), and [CO2 Dataset](https://github.com/owid/co2-data)

#### Data collection

##### Our World in Data

- For OWID's [Standard entity names](https://github.com/owid/energy-data/tree/master/scripts/input/shared), we sourced the data from https://raw.githubusercontent.com/owid/energy-data/master/scripts/input/shared/iso_codes.csv.

- For OWID's [World map region definitions](https://ourworldindata.org/world-region-map-definitions), we went to the [website](https://ourworldindata.org/world-region-map-definitions) and manually downloaded the data available on it.

- For OWID's [Energy Data](https://github.com/owid/energy-data), we sourced the data from https://raw.githubusercontent.com/owid/energy-data/master/owid-energy-data.csv.

- For OWD's [CO2 Data](https://github.com/owid/co2-data), we sourced the data from https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv.

##### World Bank's World Development Indicators

We used the `get_countries` method from the `world_bank_data` [Python package](https://github.com/mwouts/world_bank_data) to get the list of countries available in WDI. The package is availabe here.

```python
import world_bank_data as wb

df_wb = wb.get_countries()
```

##### IMF's World Economic Outlook

We used the `weo` [Python client](https://github.com/epogrebnyak/weo-reader) to get the list of countries available in WEO.

```python
import weo

path, url = weo.download(2022, 1)

df_weo = weo.WEO(path).countries()
```

#### Variables

The variables code, definition, and sources are available in our [variables.csv](/variables.csv) file. Below are the details of the variables, i.e., attributes:

| Column          | Description                                                                                                 | Source                              |
| --------------- | ----------------------------------------------------------------------------------------------------------- | ----------------------------------- |
| `Code`            | Country Code of all entities in the dataset. We prioritize OWID codes, then WB codes, and finally WEO codes | OWID, WB, WEO                       |
| `Entity`          | The Entity name. We prioritize the shortest, most English Alphabetic name                                   | OWID, WB, WEO                       |
| `OWID`            | Entity's OWID ISO code                                                                                      | OWID                                |
| `OWID_Name`       | Entity's Name according to OWID. The names come from different datasets of OWID                             | OWID                                |
| `OWID_Continent`  | Entity's Continent. Sourced from Our World in Data                                                          | OWID's [World map region definitions](https://ourworldindata.org/world-region-map-definitions) |
| `OWID_WHO_Region` | Entity's WHO region. Sourced from Our World in Data                                                         | OWID's [World map region definitions](https://ourworldindata.org/world-region-map-definitions) |
| `OWID_WB_Region`  | Entity's region according to the World Bank.  Sourced from Our World in Data                                | OWID's [World map region definitions](https://ourworldindata.org/world-region-map-definitions) |
| `OWID_UN_Region`  | Entity's region according to the United Nations. Sourced from Our World in Data.                            | OWID's [World map region definitions](https://ourworldindata.org/world-region-map-definitions) |
| `WB`              | Entity's World Bank ISO3 code                                                                               | WB                                  |
| `WB_ISO2`         | Entity's World Bank ISO2 code                                                                               | WB                                  |
| `WB_Name`         | Entity's name according to the World Bank                                                                   | WB                                  |
| `WB_region`       | Entity's World Bank region                                                                                  | WB                                  |
| `WB_adminregion`  | Entity's World Bank administrative region                                                                   | WB                                  |
| `WB_incomeLevel`  | Entity's income level according to the World Bank                                                           | WB                                  |
| `WB_lendingType`  | Entity's World Bank lending type                                                                            | WB                                  |
| `WB_capitalCity`  | Entity's capital city according to the World Bank                                                           | WB                                  |
| `WB_longitude`    | Entity's capital city according to the World Bank                                                           | WB                                  |
| `WB_latitude`     | Entity's capital city according to the World Bank                                                           | WB                                  |
| `WEO`             | Entity's WEO ISO code                                                                                       | WEO                                 |
| `WEO_Country`     | Entity's name according to the World Economic Outlook                                                       | WEO                                 |


### Chapter 1: Our World in Data

First we want to create a dataset of all countries, their ISO codes and regions from Our World in Data. Our main sources would be the OWID's [World map region definitions](https://ourworldindata.org/world-region-map-definitions) page and OWID's [standard entity names](https://github.com/owid/energy-data/tree/master/scripts/input/shared). ([source](https://raw.githubusercontent.com/owid/energy-data/master/scripts/input/shared/iso_codes.csv)). 

However, these sources did not list all available OWID's entities, so we will also interpolate our dataset with entities listed in OWID's [Energy Data](https://github.com/owid/energy-data) ([source](https://raw.githubusercontent.com/owid/energy-data/master/owid-energy-data.csv)), and [CO2 Data](https://github.com/owid/co2-data) ([source](https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv))

#### World map region definitions

First, we manually download the datasets from https://ourworldindata.org/world-region-map-definitions and stored them in the "owid" directory.

Then, we read those downloaded csv file, store each csv table in a DataFrame, and merge the DataFrames together using the Code attribute.

In [1]:
# import the needed packages
import pandas as pd
import os

In [2]:
# read the files

entities_directory = "entities/"

datafiles = []

for dirpath, _, filenames in os.walk(entities_directory + "owid"):
    for f in filenames:
        datafiles.append(os.path.join(dirpath, f))

datafiles

['entities/owid\\continents-according-to-our-world-in-data.csv',
 'entities/owid\\who-regions.csv',
 'entities/owid\\world-regions-according-to-the-world-bank.csv',
 'entities/owid\\world-regions-sdg-united-nations.csv']

In [3]:
# load each file into a DataFrame

dfs = []

for f in datafiles:
    # drop the Year column since we don't need it
    df = pd.read_csv(f).drop('Year', axis=1)
    dfs.append(df)

dfs

[                    Entity      Code Continent
 0                 Abkhazia  OWID_ABK      Asia
 1              Afghanistan       AFG      Asia
 2    Akrotiri and Dhekelia  OWID_AKD      Asia
 3                  Albania       ALB    Europe
 4                  Algeria       DZA    Africa
 ..                     ...       ...       ...
 280             Yugoslavia  OWID_YGS    Europe
 281                 Zambia       ZMB    Africa
 282               Zanzibar  OWID_ZAN    Africa
 283               Zimbabwe       ZWE    Africa
 284          Åland Islands       ALA    Europe
 
 [285 rows x 3 columns],
           Entity Code             WHO region
 0    Afghanistan  AFG  Eastern Mediterranean
 1        Albania  ALB                 Europe
 2        Algeria  DZA                 Africa
 3        Andorra  AND                 Europe
 4         Angola  AGO                 Africa
 ..           ...  ...                    ...
 189    Venezuela  VEN               Americas
 190      Vietnam  VNM       

In [4]:
# merge all dataframes based on Entity and Code

df_merged = dfs[0]

for df in dfs[1:]:
    df_merged = pd.merge(df_merged, df, on=["Entity", "Code"], how="outer")

df_merged

Unnamed: 0,Entity,Code,Continent,WHO region,World Region according to the World Bank,world-regions-according-to-the-united-nations
0,Abkhazia,OWID_ABK,Asia,,,
1,Afghanistan,AFG,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia
2,Akrotiri and Dhekelia,OWID_AKD,Asia,,,
3,Albania,ALB,Europe,Europe,Europe and Central Asia,Europe and Northern America
4,Algeria,DZA,Africa,Africa,Middle East and North Africa,Northern Africa and Western Asia
...,...,...,...,...,...,...
283,Zimbabwe,ZWE,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa
284,Åland Islands,ALA,Europe,,,
285,Micronesia,,,,East Asia and Pacific,Oceania
286,Aland Islands,,,,,Europe and Northern America


#### OWID Energy Data

In [5]:
# read the energy data file
df = pd.read_csv("https://raw.githubusercontent.com/owid/energy-data/master/owid-energy-data.csv")
df

# retrieve all iso codes and countries available in the datafile
df = df[["iso_code", "country"]].drop_duplicates()
df
# set index as iso code and country
df_owid_energy_data = df.set_index(["iso_code", "country"])
df_owid_energy_data


iso_code,country
AFG,Afghanistan
OWID_AFR,Africa
ALB,Albania
DZA,Algeria
ASM,American Samoa
...,...
OWID_WRL,World
YEM,Yemen
,Yugoslavia
ZMB,Zambia


#### OWID CO2 Data

In [6]:

# read the OWID CO2 data
df = pd.read_csv("https://raw.githubusercontent.com/owid/co2-data/master/owid-co2-data.csv")
df


# retrieve all iso codes and countries available in the datafile
df = df[["iso_code", "country"]].drop_duplicates()
df

# set index as iso code and country
df_owid_co2_data = df.set_index(["iso_code", "country"])
df_owid_co2_data


# merge two country lists from energy data and co2 data
df_owid_data = pd.merge(df_owid_energy_data, df_owid_co2_data, how="outer", left_index=True, right_index=True)
df_owid_data


iso_code,country
ABW,Aruba
AFG,Afghanistan
AGO,Angola
AIA,Anguilla
ALB,Albania
...,...
,Wake Island
,Wallis and Futuna
,West Germany
,Western Africa


#### Standard Entity Names

In [7]:

# read the standard entity names file
df = pd.read_csv("https://raw.githubusercontent.com/owid/energy-data/master/scripts/input/shared/iso_codes.csv")
df


# drop duplicates
df.drop_duplicates(subset="iso_code")
df

# rename column
df.rename(columns={'Country': 'country'}, inplace=True)
df


# set index as iso code and country
df_owid_country = df.set_index(["iso_code", "country"])
df_owid_country

# merge the standard country dataframe with the owid energy+co2 dataframe
df_owid = pd.merge(df_owid_data, df_owid_country, how="outer",
                    left_index=True, right_index=True)
df_owid

# reset index
df_owid = df_owid.reset_index()
df_owid

# rename columns to merge with df_merge
df_owid.rename(columns={'country': 'Entity', 'iso_code': 'Code'}, inplace=True)
df_owid

# merge with df_merged
df = pd.merge(df_owid, df_merged, how="outer", on=["Entity", "Code"])
df

Unnamed: 0,Code,Entity,Continent,WHO region,World Region according to the World Bank,world-regions-according-to-the-united-nations
0,ABW,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean
1,AFG,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia
2,AGO,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa
3,AIA,Anguilla,North America,,,Latin America and Caribbean
4,ALA,Åland Islands,Europe,,,
...,...,...,...,...,...,...
356,OWID_YGS,Yugoslavia,Europe,,,
357,OWID_ZAN,Zanzibar,Africa,,,
358,,Micronesia,,,East Asia and Pacific,Oceania
359,,Aland Islands,,,,Europe and Northern America


#### Process with Excel

In [8]:
# output to csv file to process with Excel
df.to_csv(entities_directory + "output/owid.csv")
df

Unnamed: 0,Code,Entity,Continent,WHO region,World Region according to the World Bank,world-regions-according-to-the-united-nations
0,ABW,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean
1,AFG,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia
2,AGO,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa
3,AIA,Anguilla,North America,,,Latin America and Caribbean
4,ALA,Åland Islands,Europe,,,
...,...,...,...,...,...,...
356,OWID_YGS,Yugoslavia,Europe,,,
357,OWID_ZAN,Zanzibar,Africa,,,
358,,Micronesia,,,East Asia and Pacific,Oceania
359,,Aland Islands,,,,Europe and Northern America


In the Excel file, we selected any countries without a Code and check if the country is duplicated in the file or not.
We also added all empty and NaN ISO code values with a specific id, in order not to merge NaN id values with df_wb in the future.
The cleaning process with Excel was done in the [custom/ProcessBook.xlsx](/custom/ProcessBook.xlsx) file
After processing with Excel, we reload the file back to continue.

In [9]:
# read the processed data from Excel
df_owid = pd.read_csv(entities_directory + "custom/owid.csv")
df_owid

Unnamed: 0,Code,Entity,Continent,WHO region,World Region according to the World Bank,world-regions-according-to-the-united-nations
0,ABW,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean
1,AFG,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia
2,AGO,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa
3,AIA,Anguilla,North America,,,Latin America and Caribbean
4,ALA,Aland Islands,Europe,,,Europe and Northern America
...,...,...,...,...,...,...
318,_empty_30,U.S. Pacific Islands,,,,
319,_empty_31,U.S. Territories,,,,
320,_empty_32,Upper-middle-income countries,,,,
321,_empty_33,Wake Island,,,,


### Chapter 2: World Bank

In [10]:
# import python package
import world_bank_data as wb

# get countries available in World Bank database
df_wb = wb.get_countries()
df_wb

Unnamed: 0_level_0,iso2Code,name,region,adminregion,incomeLevel,lendingType,capitalCity,longitude,latitude
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ABW,AW,Aruba,Latin America & Caribbean,,High income,Not classified,Oranjestad,-70.0167,12.5167
AFE,ZH,Africa Eastern and Southern,Aggregates,,Aggregates,Aggregates,,,
AFG,AF,Afghanistan,South Asia,South Asia,Low income,IDA,Kabul,69.1761,34.5228
AFR,A9,Africa,Aggregates,,Aggregates,Aggregates,,,
AFW,ZI,Africa Western and Central,Aggregates,,Aggregates,Aggregates,,,
...,...,...,...,...,...,...,...,...,...
XZN,A5,Sub-Saharan Africa excluding South Africa and ...,Aggregates,,Aggregates,Aggregates,,,
YEM,YE,"Yemen, Rep.",Middle East & North Africa,Middle East & North Africa (excluding high inc...,Low income,IDA,Sana'a,44.2075,15.3520
ZAF,ZA,South Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Upper middle income,IBRD,Pretoria,28.1871,-25.7460
ZMB,ZM,Zambia,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IDA,Lusaka,28.2937,-15.3982


#### Merge World Bank data with OWID datas 

In [11]:
# save the World Bank's ISO code to another column before merging with df_owid
df_wb["WB"] = df_wb.index
df_wb

# Save the Code in the OWID column before merging with df_wb
df_owid["OWID"] = df_owid["Code"]
df_owid

# merge df_owid with df_wb and write the table to a csv file to process with Excel
df = pd.merge(df_owid, df_wb, how="outer", left_on="Code", right_index=True)
df.to_csv(entities_directory + "output/owidwb.csv")
df
# after this, I have opened the file in excel and clean the data manually

Unnamed: 0,Code,Entity,Continent,WHO region,World Region according to the World Bank,world-regions-according-to-the-united-nations,OWID,iso2Code,name,region,adminregion,incomeLevel,lendingType,capitalCity,longitude,latitude,WB
0.0,ABW,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean,ABW,AW,Aruba,Latin America & Caribbean,,High income,Not classified,Oranjestad,-70.0167,12.51670,ABW
1.0,AFG,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia,AFG,AF,Afghanistan,South Asia,South Asia,Low income,IDA,Kabul,69.1761,34.52280,AFG
2.0,AGO,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa,AGO,AO,Angola,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,Luanda,13.2420,-8.81155,AGO
3.0,AIA,Anguilla,North America,,,Latin America and Caribbean,AIA,,,,,,,,,,
4.0,ALA,Aland Islands,Europe,,,Europe and Northern America,ALA,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
,TSS,,,,,,,T6,Sub-Saharan Africa (IDA & IBRD countries),Aggregates,,Aggregates,Aggregates,,,,TSS
,UMC,,,,,,,XT,Upper middle income,Aggregates,,Aggregates,Aggregates,,,,UMC
,WLD,,,,,,,1W,World,Aggregates,,Aggregates,Aggregates,,,,WLD
,XKX,,,,,,,XK,Kosovo,Europe & Central Asia,Europe & Central Asia (excluding high income),Upper middle income,IDA,Pristina,20.9260,42.56500,XKX


In the Excel file, we manually checked if there was any conflict between the World Bank dataset and the OWID dataset, and we also check if there were any duplicate countries.
The cleaning process with Excel was done in the [custom/ProcessBook.xlsx](/custom/ProcessBook.xlsx) file.
After finished checking and fixing the datasets, we reloaded the file back to continue.

In [12]:
df_owid_wb = pd.read_csv(entities_directory + "custom/owid_wb.csv")
df_owid_wb

Unnamed: 0,Code,Entity Name,Entity,Continent,WHO region,World Region according to the World Bank,world-regions-according-to-the-united-nations,OWID,WB,iso2Code3,name,region,adminregion,incomeLevel,lendingType,capitalCity,longitude,latitude
0,ABW,Aruba,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean,ABW,ABW,AW,Aruba,Latin America & Caribbean,,High income,Not classified,Oranjestad,-70.0167,12.51670
1,AFG,Afghanistan,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia,AFG,AFG,AF,Afghanistan,South Asia,South Asia,Low income,IDA,Kabul,69.1761,34.52280
2,AGO,Angola,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa,AGO,AGO,AO,Angola,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,Luanda,13.2420,-8.81155
3,AIA,Anguilla,Anguilla,North America,,,Latin America and Caribbean,AIA,,,,,,,,,,
4,ALA,Aland Islands,Aland Islands,Europe,,,Europe and Northern America,ALA,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,,St. Kitts-Nevis-Anguilla,St. Kitts-Nevis-Anguilla,,,,,,,,,,,,,,,
392,,U.S. Pacific Islands,U.S. Pacific Islands,,,,,,,,,,,,,,,
393,,U.S. Territories,U.S. Territories,,,,,,,,,,,,,,,
394,,Wake Island,Wake Island,,,,,,,,,,,,,,,


### Chapter 3: IMF's World Economic Outlook

In [13]:
# import the Python client
import weo

# read weo dataset
path, url = weo.download(2022, 1, filename = entities_directory + "output/weo_2022_1.csv")

path
url

#  read all countries available in the WEO dataset 
df_weo: pd.DataFrame = weo.WEO(path).countries()
df_weo

Already downloaded 2022-Apr WEO dataset at entities\output\weo_2022_1.csv


Unnamed: 0,WEO Country Code,ISO,Country
0,512,AFG,Afghanistan
44,914,ALB,Albania
88,612,DZA,Algeria
132,171,AND,Andorra
176,614,AGO,Angola
...,...,...,...
8404,582,VNM,Vietnam
8448,487,WBG,West Bank and Gaza
8492,474,YEM,Yemen
8536,754,ZMB,Zambia


In [14]:
# select only the ISO column and Country column
df_weo = df_weo[["ISO", "Country"]]
df_weo


Unnamed: 0,ISO,Country
0,AFG,Afghanistan
44,ALB,Albania
88,DZA,Algeria
132,AND,Andorra
176,AGO,Angola
...,...,...
8404,VNM,Vietnam
8448,WBG,West Bank and Gaza
8492,YEM,Yemen
8536,ZMB,Zambia


#### Merge with WB+OWID dataset

In [15]:
# store the ISO column to the WEO column to prepare for merging with WB+OWID dataset 
df_weo["WEO"] = df_weo["ISO"]
df_weo

# merge df_weo with df_owid_wb
df = pd.merge(df_owid_wb, df_weo, left_on="Code", right_on="ISO", how="outer")
df

# write the table back to Excel to continue processing
df.to_csv(entities_directory + "output/owid_wb_weo.csv")
df
# after this, I have opened the file in excel and clean the data manually

Unnamed: 0,Code,Entity Name,Entity,Continent,WHO region,World Region according to the World Bank,world-regions-according-to-the-united-nations,OWID,WB,iso2Code3,...,region,adminregion,incomeLevel,lendingType,capitalCity,longitude,latitude,ISO,Country,WEO
0,ABW,Aruba,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean,ABW,ABW,AW,...,Latin America & Caribbean,,High income,Not classified,Oranjestad,-70.0167,12.51670,ABW,Aruba,ABW
1,AFG,Afghanistan,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia,AFG,AFG,AF,...,South Asia,South Asia,Low income,IDA,Kabul,69.1761,34.52280,AFG,Afghanistan,AFG
2,AGO,Angola,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa,AGO,AGO,AO,...,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,Luanda,13.2420,-8.81155,AGO,Angola,AGO
3,AIA,Anguilla,Anguilla,North America,,,Latin America and Caribbean,AIA,,,...,,,,,,,,,,
4,ALA,Aland Islands,Aland Islands,Europe,,,Europe and Northern America,ALA,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
393,,U.S. Territories,U.S. Territories,,,,,,,,...,,,,,,,,,,
394,,Wake Island,Wake Island,,,,,,,,...,,,,,,,,,,
395,,Western Africa,Western Africa,,,,,,,,...,,,,,,,,,,
396,,,,,,,,,,,...,,,,,,,,UVK,Kosovo,UVK


In the Excel file, we checked for conflicts between WEO, WB, and OWID, and check if there were any duplicate countries in the file.
The cleaning process with Excel was done in the [custom/ProcessBook.xlsx](/custom/ProcessBook.xlsx) file.
After finished checking and fixing the datasets, we put the final processed dataset in the entities sheet.
We reload the file back to finalize.

### Final
Finally, we moved the final dataset in the "entities" Excel sheet to the front, stored it in a csv.

In [16]:
# load the sheet back to df, and write the data to entities.csv
df = pd.read_excel(entities_directory + "custom/ProcessBook.xlsx", sheet_name="entities")
df.to_csv(entities_directory + "entities.csv", index=False)
df

Unnamed: 0,Code,Entity,OWID,OWID_Name,OWID_Continent,OWID_WHO_Region,OWID_WB_Region,OWID_UN_Region,WB,WB_ISO2,WB_Name,WB_region,WB_adminregion,WB_incomeLevel,WB_lendingType,WB_capitalCity,WB_longitude,WB_latitude,WEO,WEO_Country
0,ABW,Aruba,ABW,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean,ABW,AW,Aruba,Latin America & Caribbean,,High income,Not classified,Oranjestad,-70.0167,12.51670,ABW,Aruba
1,AFE,Africa Eastern and Southern,,,,,,,AFE,ZH,Africa Eastern and Southern,Aggregates,,Aggregates,Aggregates,,,,,
2,AFG,Afghanistan,AFG,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia,AFG,AF,Afghanistan,South Asia,South Asia,Low income,IDA,Kabul,69.1761,34.52280,AFG,Afghanistan
3,AFW,Africa Western and Central,,,,,,,AFW,ZI,Africa Western and Central,Aggregates,,Aggregates,Aggregates,,,,,
4,AGO,Angola,AGO,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa,AGO,AO,Angola,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,Luanda,13.2420,-8.81155,AGO,Angola
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,,Wake Island,,Wake Island,,,,,,,,,,,,,,,,
392,,Non-OECD,,Non-OECD,,,,,,,,,,,,,,,,
393,,Oceania,,Oceania,,,,,,,,,,,,,,,,
394,,Asia,,Asia,,,,,,,,,,,,,,,,


## Data Dictionary (Info)

The script used in this section serves as a method to convert the info.xlsx Excel book into a json file to be loaded into the web.

In [17]:
import pandas as pd

info_directory = "info/"

df_info = pd.read_excel(info_directory + "info.xlsx", sheet_name="info")
df_info.set_index("value", drop=False, inplace=True)

df_order = pd.read_excel(info_directory + "info.xlsx", sheet_name="order")
df_order = df_order.apply(lambda x: x.dropna().tolist()).to_frame("order")

df = pd.merge(df_info, df_order, left_index=True, right_index=True, how="left")
df.agg(lambda x: x.dropna().to_dict(), axis=1).to_json(info_directory + "info.json")

## Final Energy Dataset

This notebook will document the entire creation of the final Countries Stats and Energy use dataset.
The dataset is sourced from Our World in Data's Energy Data and World Bank's Datasets.

The variables and documentation for this dataset can be found in the `info.xlsx` Excel workbook in the `info` directory.

In [18]:
# import needed packages
import pandas as pd
import world_bank_data as wb

### Entities

In [19]:
# read entities dataset
df_entities = pd.read_csv("entities/entities.csv")
df_entities

Unnamed: 0,Code,Entity,OWID,OWID_Name,OWID_Continent,OWID_WHO_Region,OWID_WB_Region,OWID_UN_Region,WB,WB_ISO2,WB_Name,WB_region,WB_adminregion,WB_incomeLevel,WB_lendingType,WB_capitalCity,WB_longitude,WB_latitude,WEO,WEO_Country
0,ABW,Aruba,ABW,Aruba,North America,,Latin America and Caribbean,Latin America and Caribbean,ABW,AW,Aruba,Latin America & Caribbean,,High income,Not classified,Oranjestad,-70.0167,12.51670,ABW,Aruba
1,AFE,Africa Eastern and Southern,,,,,,,AFE,ZH,Africa Eastern and Southern,Aggregates,,Aggregates,Aggregates,,,,,
2,AFG,Afghanistan,AFG,Afghanistan,Asia,Eastern Mediterranean,South Asia,Central and Southern Asia,AFG,AF,Afghanistan,South Asia,South Asia,Low income,IDA,Kabul,69.1761,34.52280,AFG,Afghanistan
3,AFW,Africa Western and Central,,,,,,,AFW,ZI,Africa Western and Central,Aggregates,,Aggregates,Aggregates,,,,,
4,AGO,Angola,AGO,Angola,Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa,AGO,AO,Angola,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,Luanda,13.2420,-8.81155,AGO,Angola
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,,Wake Island,,Wake Island,,,,,,,,,,,,,,,,
392,,Non-OECD,,Non-OECD,,,,,,,,,,,,,,,,
393,,Oceania,,Oceania,,,,,,,,,,,,,,,,
394,,Asia,,Asia,,,,,,,,,,,,,,,,


### Our World in Data

In [20]:
# load the dataset
df_owid = pd.read_csv("https://raw.githubusercontent.com/owid/energy-data/master/owid-energy-data.csv")
df_owid

Unnamed: 0,iso_code,country,year,coal_prod_change_pct,coal_prod_change_twh,gas_prod_change_pct,gas_prod_change_twh,oil_prod_change_pct,oil_prod_change_twh,energy_cons_change_pct,...,solar_consumption,solar_elec_per_capita,solar_energy_per_capita,wind_share_elec,wind_cons_change_pct,wind_share_energy,wind_cons_change_twh,wind_consumption,wind_elec_per_capita,wind_energy_per_capita
0,AFG,Afghanistan,1900,,,,,,,,...,,,,,,,,,,
1,AFG,Afghanistan,1901,,0.000,,,,,,...,,,,,,,,,,
2,AFG,Afghanistan,1902,,0.000,,,,,,...,,,,,,,,,,
3,AFG,Afghanistan,1903,,0.000,,,,,,...,,,,,,,,,,
4,AFG,Afghanistan,1904,,0.000,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17234,ZWE,Zimbabwe,2016,-37.694,-12.257,,0.0,,0.0,-14.611,...,,0.713,,0.0,,,,,0.0,
17235,ZWE,Zimbabwe,2017,8.375,1.697,,,,,-1.564,...,,0.702,,0.0,,,,,0.0,
17236,ZWE,Zimbabwe,2018,14.336,3.148,,,,,3.409,...,,0.693,,0.0,,,,,0.0,
17237,ZWE,Zimbabwe,2019,-21.529,-5.405,,,,,4.052,...,,0.683,,0.0,,,,,0.0,


#### Transform data

In [21]:
# transform data

# transform year to int data type
# this is so that iso_code + year can be compatible primary keys with the world bank dataset
df_owid["year"] = df_owid["year"].transform(int)
df_owid

# remove country column
# We already have defined the country names in the "entities" table
df_owid.drop(columns="country", inplace=True)
df_owid

Unnamed: 0,iso_code,year,coal_prod_change_pct,coal_prod_change_twh,gas_prod_change_pct,gas_prod_change_twh,oil_prod_change_pct,oil_prod_change_twh,energy_cons_change_pct,energy_cons_change_twh,...,solar_consumption,solar_elec_per_capita,solar_energy_per_capita,wind_share_elec,wind_cons_change_pct,wind_share_energy,wind_cons_change_twh,wind_consumption,wind_elec_per_capita,wind_energy_per_capita
0,AFG,1900,,,,,,,,,...,,,,,,,,,,
1,AFG,1901,,0.000,,,,,,,...,,,,,,,,,,
2,AFG,1902,,0.000,,,,,,,...,,,,,,,,,,
3,AFG,1903,,0.000,,,,,,,...,,,,,,,,,,
4,AFG,1904,,0.000,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17234,ZWE,2016,-37.694,-12.257,,0.0,,0.0,-14.611,-7.985,...,,0.713,,0.0,,,,,0.0,
17235,ZWE,2017,8.375,1.697,,,,,-1.564,-0.730,...,,0.702,,0.0,,,,,0.0,
17236,ZWE,2018,14.336,3.148,,,,,3.409,1.566,...,,0.693,,0.0,,,,,0.0,
17237,ZWE,2019,-21.529,-5.405,,,,,4.052,1.925,...,,0.683,,0.0,,,,,0.0,


##### Transform iso_code to Entities Code

In [22]:
# transform iso code to Entities Code

# df_entities Code and OWID
df_entities[["Code", "OWID"]]

#  merge df_owid with df_entities[["Code", "OWID"]]
df_owid = pd.merge(df_entities[["Code", "OWID"]], df_owid, how="right", left_on="Code", right_on="iso_code")
df_owid

# transform iso_code to standard df_entities Code
df_owid["iso_code"] = df_owid["Code"]
df_owid

# remove Code and OWID
# we can ensure that this cannot affect the original OWID dataset, since there is no variable in the dataset name "Code" or "OWID"
df_owid.drop(columns=["Code", "OWID"], inplace=True)
df_owid

Unnamed: 0,iso_code,year,coal_prod_change_pct,coal_prod_change_twh,gas_prod_change_pct,gas_prod_change_twh,oil_prod_change_pct,oil_prod_change_twh,energy_cons_change_pct,energy_cons_change_twh,...,solar_consumption,solar_elec_per_capita,solar_energy_per_capita,wind_share_elec,wind_cons_change_pct,wind_share_energy,wind_cons_change_twh,wind_consumption,wind_elec_per_capita,wind_energy_per_capita
0,AFG,1900,,,,,,,,,...,,,,,,,,,,
1,AFG,1901,,0.000,,,,,,,...,,,,,,,,,,
2,AFG,1902,,0.000,,,,,,,...,,,,,,,,,,
3,AFG,1903,,0.000,,,,,,,...,,,,,,,,,,
4,AFG,1904,,0.000,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
47177,ZWE,2016,-37.694,-12.257,,0.0,,0.0,-14.611,-7.985,...,,0.713,,0.0,,,,,0.0,
47178,ZWE,2017,8.375,1.697,,,,,-1.564,-0.730,...,,0.702,,0.0,,,,,0.0,
47179,ZWE,2018,14.336,3.148,,,,,3.409,1.566,...,,0.693,,0.0,,,,,0.0,
47180,ZWE,2019,-21.529,-5.405,,,,,4.052,1.925,...,,0.683,,0.0,,,,,0.0,


#### Melt the dataset

In [23]:
# melt dataset and remove nan values

# melt the dataset
df_owid = pd.melt(df_owid, id_vars=["iso_code", "year"], var_name="attr", value_name="owid")
df_owid

# remove nan values
df_owid.dropna(inplace=True)
df_owid

Unnamed: 0,iso_code,year,attr,owid
50,AFG,1950,coal_prod_change_pct,180.000
51,AFG,1951,coal_prod_change_pct,7.143
52,AFG,1952,coal_prod_change_pct,13.333
53,AFG,1953,coal_prod_change_pct,-5.882
54,AFG,1954,coal_prod_change_pct,-6.250
...,...,...,...,...
5894875,OWID_WRL,2016,wind_energy_per_capita,324.182
5894876,OWID_WRL,2017,wind_energy_per_capita,377.678
5894877,OWID_WRL,2018,wind_energy_per_capita,413.472
5894878,OWID_WRL,2019,wind_energy_per_capita,455.268


### World Bank

#### Sample Code

In [24]:
# get series
# the output is a Series with MultiLevel Index 
# id or value is id, meaning that the Country value will be the ISO code instead of name, and Series value will be Indicator Code instead of Indicator name
df_sample: pd.Series = wb.get_series(indicator="AG.LND.TOTL.K2", id_or_value="id")
df_sample

Country  Series          Year
AFE      AG.LND.TOTL.K2  1960             NaN
                         1961    1.477796e+07
                         1962    1.477796e+07
                         1963    1.477796e+07
                         1964    1.477796e+07
                                     ...     
ZWE      AG.LND.TOTL.K2  2017    3.868500e+05
                         2018    3.868500e+05
                         2019    3.868500e+05
                         2020    3.868500e+05
                         2021    3.868500e+05
Name: AG.LND.TOTL.K2, Length: 16492, dtype: float64

In [25]:
# transform the series to dataframe with name "Value". The name will be the value column name when transformed to normal table
df_sample = df_sample.to_frame("Value")
df_sample

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Value
Country,Series,Year,Unnamed: 3_level_1
AFE,AG.LND.TOTL.K2,1960,
AFE,AG.LND.TOTL.K2,1961,1.477796e+07
AFE,AG.LND.TOTL.K2,1962,1.477796e+07
AFE,AG.LND.TOTL.K2,1963,1.477796e+07
AFE,AG.LND.TOTL.K2,1964,1.477796e+07
...,...,...,...
ZWE,AG.LND.TOTL.K2,2017,3.868500e+05
ZWE,AG.LND.TOTL.K2,2018,3.868500e+05
ZWE,AG.LND.TOTL.K2,2019,3.868500e+05
ZWE,AG.LND.TOTL.K2,2020,3.868500e+05


In [26]:
# reset index in order to change the MultiLevel Index column format to a normal column format
df_sample.reset_index(inplace=True)
df_sample

Unnamed: 0,Country,Series,Year,Value
0,AFE,AG.LND.TOTL.K2,1960,
1,AFE,AG.LND.TOTL.K2,1961,1.477796e+07
2,AFE,AG.LND.TOTL.K2,1962,1.477796e+07
3,AFE,AG.LND.TOTL.K2,1963,1.477796e+07
4,AFE,AG.LND.TOTL.K2,1964,1.477796e+07
...,...,...,...,...
16487,ZWE,AG.LND.TOTL.K2,2017,3.868500e+05
16488,ZWE,AG.LND.TOTL.K2,2018,3.868500e+05
16489,ZWE,AG.LND.TOTL.K2,2019,3.868500e+05
16490,ZWE,AG.LND.TOTL.K2,2020,3.868500e+05


In [27]:
# end sample code
del df_sample

#### Collect data

In [28]:
# set a dictionary of indicator codes
# each value of the indicator code is the name of the variable in our final dataset
IndicatorCodes = {
    "AG.LND.TOTL.K2": "area",
    "SP.POP.TOTL": "population",
    "NY.GDP.MKTP.CD": "gdp",
    "NY.GDP.MKTP.PP.CD": "gdp_ppp",
    "NY.GDP.PCAP.CD": "gdp_per_captia",
    "NY.GDP.PCAP.PP.CD": "gdp_ppp_per_capita",
}

# get all indicator codes datasets
list_df = []

for indicator_code in IndicatorCodes:
    # get series
    df: pd.Series = wb.get_series(indicator=indicator_code, id_or_value="id")
    # trasnform to dataframe with name Value
    df = df.to_frame("Value")
    # reset index to transform into a normal
    df.reset_index(inplace=True)
    # add to df list
    list_df.append(df)

df_wb = pd.concat(list_df)
df_wb

# then, delete unwanted values
del list_df, df, indicator_code

#### Transform data

In [29]:
# transform data

# rename "Series" value 
# in order to match variable value with OWID data
df_wb["Series"] = df_wb["Series"].transform(lambda x: IndicatorCodes[x])
df_wb

# remove nan values
df_wb.dropna(inplace=True)
df_wb

# rename column, prepare to join with df_owid
df_wb.rename(columns={
    "Country": "iso_code",
    "Year": "year",
    "Series": "attr",
    "Value": "wb"
}, inplace=True)
df_wb

# transform year to int data type
# this is so that iso_code + year can be compatible primary keys with the owid dataset
df_wb["year"] = df_wb["year"].transform(int)
df_wb

Unnamed: 0,iso_code,attr,year,wb
1,AFE,area,1961,1.477796e+07
2,AFE,area,1962,1.477796e+07
3,AFE,area,1963,1.477796e+07
4,AFE,area,1964,1.477796e+07
5,AFE,area,1965,1.477796e+07
...,...,...,...,...
16486,ZWE,gdp_ppp_per_capita,2016,2.806469e+03
16487,ZWE,gdp_ppp_per_capita,2017,3.795642e+03
16488,ZWE,gdp_ppp_per_capita,2018,4.017222e+03
16489,ZWE,gdp_ppp_per_capita,2019,3.783548e+03


#### Transform iso_code to Entity Code

In [30]:
# transform iso code to entity code

# df_entities Code and OWID
df_entities[["Code", "WB"]]

#  merge df_owid with df_entities[["Code", "OWID"]]
df_wb = pd.merge(df_entities[["Code", "WB"]], df_wb, how="right", left_on="Code", right_on="iso_code")
df_wb

# transform iso_code to standard df_entities Code
df_wb["iso_code"] = df_wb["Code"]
df_wb

# remove Code and WB
df_wb.drop(columns=["Code", "WB"], inplace=True)
df_wb

# remove nan values
df_wb.dropna(inplace=True)
df_wb

Unnamed: 0,iso_code,attr,year,wb
0,AFE,area,1961,1.477796e+07
1,AFE,area,1962,1.477796e+07
2,AFE,area,1963,1.477796e+07
3,AFE,area,1964,1.477796e+07
4,AFE,area,1965,1.477796e+07
...,...,...,...,...
72254,ZWE,gdp_ppp_per_capita,2016,2.806469e+03
72255,ZWE,gdp_ppp_per_capita,2017,3.795642e+03
72256,ZWE,gdp_ppp_per_capita,2018,4.017222e+03
72257,ZWE,gdp_ppp_per_capita,2019,3.783548e+03


#### Merge OWID and WB data

In [31]:
# merge owid and wb data

# merge df_owid with df_wb
df = pd.merge(df_owid, df_wb, how="outer", on=["iso_code", "year", "attr"])
df

# merge two values from two different datasets
# we prioritize wb values
df["value"] = df["wb"].fillna(df["owid"])
df

# remove owid and wb, only keep value
df = df[["iso_code", "year", "attr", "value"]]
df

Unnamed: 0,iso_code,year,attr,value
0,AFG,1950,coal_prod_change_pct,180.000000
1,AFG,1951,coal_prod_change_pct,7.143000
2,AFG,1952,coal_prod_change_pct,13.333000
3,AFG,1953,coal_prod_change_pct,-5.882000
4,AFG,1954,coal_prod_change_pct,-6.250000
...,...,...,...,...
736790,ZWE,2016,gdp_ppp_per_capita,2806.469032
736791,ZWE,2017,gdp_ppp_per_capita,3795.642431
736792,ZWE,2018,gdp_ppp_per_capita,4017.221716
736793,ZWE,2019,gdp_ppp_per_capita,3783.547898


### Entities Groups

In [32]:
# variables needed
# variables name and matching column in Entities dataset
vars = {
    "iso_code": "Code",
    "country": "Entity",
    "group_OWID_Continent": "OWID_Continent",
    "group_UN_Region": "OWID_UN_Region",
    "group_WHO_Region": "OWID_WHO_Region",
    "group_WB_region": "WB_region",
    "group_WB_adminregion": "WB_adminregion",
    "group_WB_incomeLevel": "WB_incomeLevel",
    "group_WB_lendingType": "WB_lendingType",
    "group_WB_longitude": "WB_longitude",
    "group_WB_latitude": "WB_latitude"
}

# variables of group type
# each of these variable will have a twin variable with the US's value always equals United States
# this is so that in the visualisation, the US can be highlighted
vars_group = [
    "group_OWID_Continent",
    "group_UN_Region",
    "group_WHO_Region",
    "group_WB_region",
    "group_WB_adminregion",
    "group_WB_incomeLevel",
    "group_WB_lendingType"
]


In [33]:

# create df_groups

df_groups = pd.DataFrame()

for k in vars:

    df_groups[k] = df_entities[vars[k]]

df_groups


Unnamed: 0,iso_code,country,group_OWID_Continent,group_UN_Region,group_WHO_Region,group_WB_region,group_WB_adminregion,group_WB_incomeLevel,group_WB_lendingType,group_WB_longitude,group_WB_latitude
0,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,12.51670
1,AFE,Africa Eastern and Southern,,,,Aggregates,,Aggregates,Aggregates,,
2,AFG,Afghanistan,Asia,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia,Low income,IDA,69.1761,34.52280
3,AFW,Africa Western and Central,,,,Aggregates,,Aggregates,Aggregates,,
4,AGO,Angola,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,13.2420,-8.81155
...,...,...,...,...,...,...,...,...,...,...,...
391,,Wake Island,,,,,,,,,
392,,Non-OECD,,,,,,,,,
393,,Oceania,,,,,,,,,
394,,Asia,,,,,,,,,


In [34]:

# add twin attributes that prioritize US

for k in vars_group:

    df_groups[k + "_exclude_US"] = df_groups[k]

    df_groups.loc[df_groups["country"] == "United States", k + "_exclude_US"] = "United States"

df_groups

Unnamed: 0,iso_code,country,group_OWID_Continent,group_UN_Region,group_WHO_Region,group_WB_region,group_WB_adminregion,group_WB_incomeLevel,group_WB_lendingType,group_WB_longitude,group_WB_latitude,group_OWID_Continent_exclude_US,group_UN_Region_exclude_US,group_WHO_Region_exclude_US,group_WB_region_exclude_US,group_WB_adminregion_exclude_US,group_WB_incomeLevel_exclude_US,group_WB_lendingType_exclude_US
0,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,12.51670,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified
1,AFE,Africa Eastern and Southern,,,,Aggregates,,Aggregates,Aggregates,,,,,,Aggregates,,Aggregates,Aggregates
2,AFG,Afghanistan,Asia,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia,Low income,IDA,69.1761,34.52280,Asia,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia,Low income,IDA
3,AFW,Africa Western and Central,,,,Aggregates,,Aggregates,Aggregates,,,,,,Aggregates,,Aggregates,Aggregates
4,AGO,Angola,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,13.2420,-8.81155,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,,Wake Island,,,,,,,,,,,,,,,,
392,,Non-OECD,,,,,,,,,,,,,,,,
393,,Oceania,,,,,,,,,,,,,,,,
394,,Asia,,,,,,,,,,,,,,,,


In [35]:

# add a variable that equals United States if and only if the Entity is United States

df_groups["group_is_USA"] = df_groups["country"].copy(deep=True).transform(lambda x: x if x == "United States" else "Other Entities")

df_groups

Unnamed: 0,iso_code,country,group_OWID_Continent,group_UN_Region,group_WHO_Region,group_WB_region,group_WB_adminregion,group_WB_incomeLevel,group_WB_lendingType,group_WB_longitude,group_WB_latitude,group_OWID_Continent_exclude_US,group_UN_Region_exclude_US,group_WHO_Region_exclude_US,group_WB_region_exclude_US,group_WB_adminregion_exclude_US,group_WB_incomeLevel_exclude_US,group_WB_lendingType_exclude_US,group_is_USA
0,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,12.51670,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,Other Entities
1,AFE,Africa Eastern and Southern,,,,Aggregates,,Aggregates,Aggregates,,,,,,Aggregates,,Aggregates,Aggregates,Other Entities
2,AFG,Afghanistan,Asia,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia,Low income,IDA,69.1761,34.52280,Asia,Central and Southern Asia,Eastern Mediterranean,South Asia,South Asia,Low income,IDA,Other Entities
3,AFW,Africa Western and Central,,,,Aggregates,,Aggregates,Aggregates,,,,,,Aggregates,,Aggregates,Aggregates,Other Entities
4,AGO,Angola,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,13.2420,-8.81155,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,IBRD,Other Entities
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
391,,Wake Island,,,,,,,,,,,,,,,,,Other Entities
392,,Non-OECD,,,,,,,,,,,,,,,,,Other Entities
393,,Oceania,,,,,,,,,,,,,,,,,Other Entities
394,,Asia,,,,,,,,,,,,,,,,,Other Entities


In [36]:
# remove not needed variables in this script
del vars, vars_group, k, IndicatorCodes

### Final

In [37]:
# pivot dataset
df = pd.pivot(df, index=["iso_code", "year"], columns="attr", values="value")
df

# reset index
# reset index to make iso_code and year become normal attributes again
df.reset_index(inplace=True)
df

# merge values dataset with groups dataset
df = pd.merge(df_groups, df, on="iso_code", how="right")
df

Unnamed: 0,iso_code,country,group_OWID_Continent,group_UN_Region,group_WHO_Region,group_WB_region,group_WB_adminregion,group_WB_incomeLevel,group_WB_lendingType,group_WB_longitude,...,solar_share_elec,solar_share_energy,wind_cons_change_pct,wind_cons_change_twh,wind_consumption,wind_elec_per_capita,wind_electricity,wind_energy_per_capita,wind_share_elec,wind_share_energy
0,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,...,,,,,,,,,,
1,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,...,,,,,,,,,,
2,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,...,,,,,,,,,,
3,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,...,,,,,,,,,,
4,ABW,Aruba,North America,Latin America and Caribbean,,Latin America & Caribbean,,High income,Not classified,-70.0167,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22306,ZWE,Zimbabwe,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,Blend,31.0672,...,0.137,,,,,0.0,0.0,,0.0,
22307,ZWE,Zimbabwe,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,Blend,31.0672,...,0.110,,,,,0.0,0.0,,0.0,
22308,ZWE,Zimbabwe,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,Blend,31.0672,...,0.088,,,,,0.0,0.0,,0.0,
22309,ZWE,Zimbabwe,Africa,Sub-Saharan Africa,Africa,Sub-Saharan Africa,Sub-Saharan Africa (excluding high income),Lower middle income,Blend,31.0672,...,0.088,,,,,0.0,0.0,,0.0,


In [38]:
# write dataset to json
df.to_json("data.json", orient="records")

In [39]:
# write dataset to csv
df.to_csv("data.csv", index=False)

## Final processing

At this time, we have already finished building our needed dataset, and the dataset can be loaded directly into the web application to render the visualisation.
This section of the script filters the dataset and transform it into a format that can be loaded into the web easier.


In [40]:

# Final dataset processing

COUNTRIES_NEEDED_YEARS = [2019]

COUNTRIES_NEEDED_ISO_CODES = [
    "USA",
    "DEU",
    "CAN",
    "FRA",
    "ESP",
    "MEX",
    "ITA",
    "GBR",
    "ARG",
    "BRA",
    "NLD",
    "POL",
    "SWE",
    "BEL",
    "AUS",
    "AUT",
    "THA",
    "OWID_EUR",
    "IND",
    "CHN",
    "PRT",
    "JPN",
    "HUN",
    "ROU",
    "BGR",
    "PAK",
    "ZAF",
    "TUR",
    "CHL",
    "PER",
    "NOR",
    "GRC",
    "DNK",
    "VNM",
    "BGD",
    "EGY",
    "IRL",
    "KOR",
    "ECU",
    "TWN",
    "PHL",
    "TUN",
    "BOL",
    "MNG",
    "BDI",
    "COL",
    "IDN",
    "IRN",
    "MYS",
]

COUNTRIES_NEEDED_VARIABLES = [
    "iso_code",
    "country",
    "year",
    "group_is_USA",
    "group_OWID_Continent",
    "group_WHO_Region",
    "group_WB_incomeLevel",
    "group_WB_lendingType",
    "area",
    "population",
    "gdp",
    "gdp_per_captia",
    "coal_production",
    "electricity_demand",
    "electricity_generation",
    "fossil_fuel_consumption",
    "gas_production",
    "greenhouse_gas_emissions",
    "oil_production",
]

# filter datasets

df_countries = df[
    df["iso_code"].isin(COUNTRIES_NEEDED_ISO_CODES)
    &
    df["year"].isin(COUNTRIES_NEEDED_YEARS)
]

# write dataset to json
df_countries.to_json("final/data.json", orient="records")

# write dataset to csv
df_countries.to_csv("final/data.csv", index=False)

df_countries[COUNTRIES_NEEDED_VARIABLES].to_json("final/countries.json", orient="records")

df_countries[COUNTRIES_NEEDED_VARIABLES].to_csv("final/countries.csv", index=False)

df_countries[COUNTRIES_NEEDED_VARIABLES]

Unnamed: 0,iso_code,country,year,group_is_USA,group_OWID_Continent,group_WHO_Region,group_WB_incomeLevel,group_WB_lendingType,area,population,gdp,gdp_per_captia,coal_production,electricity_demand,electricity_generation,fossil_fuel_consumption,gas_production,greenhouse_gas_emissions,oil_production
877,ARG,Argentina,2019,Other Entities,South America,Americas,Upper middle income,IBRD,2736690.0,44938710.0,451932400000.0,10056.63794,,143.15,132.46,792.53,416.158,46.54,335.349
1185,AUS,Australia,2019,Other Entities,Oceania,Western Pacific,High income,Not classified,7692020.0,25365740.0,1391953000000.0,54875.285956,3669.878,250.26,250.26,1493.91,1431.49,147.18,224.077
1307,AUT,Austria,2019,Other Entities,Europe,Europe,High income,Not classified,82520.0,8879920.0,445011900000.0,50114.40111,,73.99,70.86,275.456,,11.37,
1491,BDI,Burundi,2019,Other Entities,Africa,Africa,Low income,IDA,25680.0,11530580.0,2631434000.0,228.213589,,0.46,0.36,,,0.09,
1613,BEL,Belgium,2019,Other Entities,Europe,Europe,High income,Not classified,30280.0,11488980.0,535288700000.0,46591.491607,,90.45,92.3,579.424,,17.19,
1859,BGD,Bangladesh,2019,Other Entities,Asia,South-East Asia,Lower middle income,IDA,130170.0,163046200.0,302571300000.0,1855.740094,,86.39,79.6,452.465,252.844,42.68,
1981,BGR,Bulgaria,2019,Other Entities,Europe,Europe,Upper middle income,IBRD,108560.0,6975761.0,68915420000.0,9879.268533,56.722,38.01,43.82,145.296,,16.2,
2535,BOL,Bolivia,2019,Other Entities,South America,Americas,Lower middle income,IBRD,1083300.0,11513100.0,40895320000.0,3552.068143,,9.99,9.99,,149.503,3.15,
2657,BRA,Brazil,2019,Other Entities,South America,Americas,Upper middle income,IBRD,8358140.0,211049500.0,1877824000000.0,8897.552966,25.144,639.51,614.55,1873.529,257.465,76.58,1753.507
3209,CAN,Canada,2019,Other Entities,North America,Americas,High income,Not classified,8965590.0,37601230.0,1742015000000.0,46328.671841,316.326,586.7,633.71,2682.728,1689.858,84.62,3064.065
