# Worldbank data on China, US, Russia, and Ukraine
The data comes from these sites:
* [China](https://data.worldbank.org/country/china?view=chart)
* [US](https://data.worldbank.org/country/us?view=chart)
* [Russia](https://data.worldbank.org/country/russian-federation?view=chart)
* [Ukraine](https://data.worldbank.org/country/ukraine?view=chart)

They also have the same data on [every other country in the world](https://data.worldbank.org/country)!

First we need to import the libraries we will use


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Read in the data, but skip the 3 header rows
Keep the 4th row which has the column names though!

In [None]:
china_df = pd.read_csv('data/china-worldbank-data/china_data.csv',skiprows=3)
us_df = pd.read_csv('data/us-worldbank-data/us_data.csv',skiprows=3)
russia_df = pd.read_csv('data/russia-worldbank-data/russia_data.csv',skiprows=3)
ukraine_df = pd.read_csv('data/ukraine-worldbank-data/ukraine_data.csv',skiprows=3)
us_df


# Use df.iloc[k] to look at the kth row of the data frame

In [None]:
row0 = china_df.iloc[0]
row0

In [None]:
row0 = china_df.iloc[0]
for key in china_df.columns:
    print(key,row0[key])

# Search for indicators of interest
The following function will return the row number and indicator name for all indicators containing a phrase.

In [None]:
def find_rows(df,phrase):
    for i in range(len(df)):
        indicator = df.iloc[i]['Indicator Name']
        if phrase in indicator:
            print(i,indicator)


# Comparing CO2 emissions
We can now look for worldbank information about CO2 emissions from the various countries

In [None]:

find_rows(china_df,'')

In [None]:
find_rows(us_df,'CO2 emissions (kt)')

In [None]:
find_rows(russia_df,'CO2 emissions (kt)')

In [None]:
find_rows(ukraine_df,'CO2 emissions (kt)')

# Plot the data
Once we know which rows the data appears on we can select out the data for the years from 1960 to 20210 and plot that data!

In [None]:
def plot_indicator(df,row,label):
    plt.plot(range(1960,2021),df.iloc[row]['1960':'2020'],label=label)
plt.figure(figsize=(15,10))
plot_indicator(china_df,314,'china')
plot_indicator(us_df,1375,'us')
plot_indicator(russia_df,1336,'russia')
plot_indicator(ukraine_df,639,'ukraine')
plt.legend()
plt.grid()
plt.title('CO2 emissions kt')
    

# Discussion
We see that Russian emissions dropped by half in 1990-91, but that's because the USSR was broken up into a dozen republics!

The US and Ukrainian CO2 emissiona are actually going down in the last 20 years, but China rose rapidly from 1990-2010 and then flattened out.

We don't know why, we could then look at GDP/person and see if perhaps the rising standard of living in China is what drove that increase.

In [None]:
# GDP (constant 2015 US$) 
# US 898   China 47  Russia 394 Ukraine 701
find_rows(us_df,'GDP (constant 2015 US$)')

In [None]:
plt.figure(figsize=(15,10))
#plot_indicator(us_df,898,'us')
#plot_indicator(china_df,47,'china')
plot_indicator(russia_df,394,'russia')
plot_indicator(ukraine_df,701,'ukraine')
plt.legend()
plt.grid()
plt.title("GDP (constant 2015 US$) ",fontsize=32)


In [None]:
china_df.iloc[47]