## CPA-01

## Dataset description
"This dataset is a collection of key metrics maintained by Our World in Data. It is updated regularly and includes data on energy consumption (primary energy, per capita, and growth rates), energy mix, electricity mix and other relevant metrics." (directly cite from link below)

Data link: https://www.kaggle.com/pralabhpoudel/world-energy-consumption

One could see a "download" button in the page. After logging into Kaggle, one should be able to download this dataset. 

### Question 1
What is the trend of coal consumption among countries that rank top five in GDP?

### Question 2
What is the trend of fossil electricity generated by countries that rank top five in GDP? How about nuclear electricity?


In [None]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# read in data file
df = pd.read_csv("data/World Energy Consumption.csv")

In [None]:
# get a rough overview of the data
df.describe()

In [None]:
# show table itself
df

In [None]:
# show columns names
df.columns

In [None]:
# show indexing
df_china = df[df['country'] == 'China']
df_china

In [None]:
# plot China's change in energy_cons_change_pct
plt.plot(df_china['year'], df_china['energy_cons_change_pct'])
plt.xlabel("Year")
plt.ylabel("Annual Percentage Change")
plt.title("China: Annual percentage change in primary energy consumption")

In [None]:
# Select subset of data for further analysis
country_li = ['China', 'Japan', 'United States', 'Germany']
df_c = df[df['country'].isin(country_li)]
df_c

In [None]:
# use pivot table
df_c_cc = pd.pivot_table(df_c, index='year', values='coal_consumption', columns='country')
df_c_cc.head()

In [None]:
# plot df_c_cc
df_c_cc.plot()
plt.xlabel('Year')
plt.ylabel('Terawatt-hours')
plt.title('Primary energy consumption from coal (China, Japan, United States, Germany)')
plt.legend(title='Country')

In [None]:
# See col consumption per capita data
df_c_cpc = pd.pivot_table(df_c, index='year', columns='country', values='coal_cons_per_capita')
df_c_cpc.head()

In [None]:
# plot df_c_cpc
df_c_cpc.plot()
plt.xlabel('Year')
plt.ylabel('Kilowatt-hours')
plt.title('Per capita primary energy consumption from coal (China, Japan, United States, Germany)')
plt.legend(title='Country')

### Analysis of question 1
We can see from both the graph of coal consumption and the graph of coal consumption per capita that amount of annual coal production in China begins rapid increasing since the beginning of 2000. In the mean time, there is a decline in Germany and United States' amount of cola consumption. This might be the result of China's role as world factory and its plans for boosting development of infrastructure. Besides, the annual coal consumption of Japan also increased steadly throughout those years, factors behind which are worthy exploring.

### Question 2

In [None]:
# Graph for fossil and nuclear electricity
df_c_fe = df_c[(df_c['fossil_electricity'] > 0) & (df_c['nuclear_electricity'] > 0)]
df_c_fe
df_c_gb = df_c_fe.groupby(['year'])['fossil_electricity', 'nuclear_electricity'].agg('sum')
df_c_gb.plot()

In [None]:
# Graph for nuclear electricity
df_c_n = df_c[df_c['nuclear_electricity'] > 0]
df_c_n.groupby(['year'])['nuclear_electricity'].agg('sum').fillna(0).plot()
plt.xlabel("Year")
plt.ylabel("Terawatt-hours")
plt.title("Electricity generation from nuclear power (sum of China, Japan, USA, Germany)")

In [None]:
# Graph for fossil and nuclear electricity
df_c_tra = df_c[(df_c['fossil_electricity'] > 0) & (df_c['gas_electricity'] > 0) &
              (df_c['oil_electricity'] > 0) & (df_c['coal_electricity'] > 0)]
df_c_gb_tra = df_c_tra.groupby(['year'])['fossil_electricity', 'gas_electricity', 'coal_electricity', 'oil_electricity'].agg('sum')
df_c_gb_tra.plot()
plt.xlabel("Year")
plt.ylabel("Terawatt-hours")
plt.title("Fossil electricity: coal, gas, and oil (sum of China, Japan, USA, Germany)")

### Analysis of question 2
From the first graph in this section, we can see that from 1985 to 2020, majority electricty from China, Japan, United States, and Germany is from fossil. Besides, there is a widening gap between fossil electricity and nuclear electricity. So there is still long way to have more clean energy. 

From the second graph, we could see the change of nuclear electricty. A huge decrease occured around 2011, which was the time for the Fukushima Daiichi nuclear power plant explosion. Panic from the public forced government closing down some nuclear facilities and resulted in a decline in nuclear electricity. However, as time passed, there is an increasing trend in nuclear electricity.

From the third graph, we could see that electricity generated from coal is dominant. 