# Alex Huisman

## Research question/interests

I am interested in exploring trends in the global availability of clean energy. Along with exploring how clean energy availability has changed, I specifically want to analyze the adoption of individual clean energy sources. What is the most popular source of clean energy globally? What specific energy sources can be differentiated by GDP? Who actually has access to each clean energy source? Are there discrepancies between the production and consumption of each clean energy source? In analyzing this data and answering these questions, I am also excited to gain a better understanding of the world's energy use, as well as its progress in renewable energy adoption. By the end of this project, I will have built a dashboard to more easily visualize these questions I have set out to answer.

In [6]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

### Loading the Data

In [8]:
# Load the dataset from our repository
df = pd.read_csv('../data/raw/owid-energy-data.csv')
df

Unnamed: 0,country,year,iso_code,population,gdp,biofuel_cons_change_pct,biofuel_cons_change_twh,biofuel_cons_per_capita,biofuel_consumption,biofuel_elec_per_capita,...,solar_share_elec,solar_share_energy,wind_cons_change_pct,wind_cons_change_twh,wind_consumption,wind_elec_per_capita,wind_electricity,wind_energy_per_capita,wind_share_elec,wind_share_energy
0,Afghanistan,1900,AFG,4832414.0,,,,,,,...,,,,,,,,,,
1,Afghanistan,1901,AFG,4879685.0,,,,,,,...,,,,,,,,,,
2,Afghanistan,1902,AFG,4935122.0,,,,,,,...,,,,,,,,,,
3,Afghanistan,1903,AFG,4998861.0,,,,,,,...,,,,,,,,,,
4,Afghanistan,1904,AFG,5063419.0,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22338,Zimbabwe,2017,ZWE,14236599.0,2.194784e+10,,,,,22.477,...,0.137,,,,,0.0,0.0,,0.0,
22339,Zimbabwe,2018,ZWE,14438812.0,2.271535e+10,,,,,27.011,...,0.110,,,,,0.0,0.0,,0.0,
22340,Zimbabwe,2019,ZWE,14645473.0,,,,,,25.947,...,0.088,,,,,0.0,0.0,,0.0,
22341,Zimbabwe,2020,ZWE,14862927.0,,,,,,24.221,...,0.090,,,,,0.0,0.0,,0.0,


## Exploratory Data Analysis

The first step in answering my research questions is to do a preliminary analysis on the dataset. This includes finding out how much data there is and what key information is present.

In [15]:
# Return the physical size of the dataset
print(f"The dataset contains {df.shape[0]} rows and {df.shape[1]} columns.")

The dataset contains 22343 rows and 128 columns.


In [17]:
# List all columns of the dataset
print(f"The possible columns are:\n {list(df.columns)}")

The possible columns are:
 ['country', 'year', 'iso_code', 'population', 'gdp', 'biofuel_cons_change_pct', 'biofuel_cons_change_twh', 'biofuel_cons_per_capita', 'biofuel_consumption', 'biofuel_elec_per_capita', 'biofuel_electricity', 'biofuel_share_elec', 'biofuel_share_energy', 'carbon_intensity_elec', 'coal_cons_change_pct', 'coal_cons_change_twh', 'coal_cons_per_capita', 'coal_consumption', 'coal_elec_per_capita', 'coal_electricity', 'coal_prod_change_pct', 'coal_prod_change_twh', 'coal_prod_per_capita', 'coal_production', 'coal_share_elec', 'coal_share_energy', 'electricity_demand', 'electricity_generation', 'energy_cons_change_pct', 'energy_cons_change_twh', 'energy_per_capita', 'energy_per_gdp', 'fossil_cons_change_pct', 'fossil_cons_change_twh', 'fossil_elec_per_capita', 'fossil_electricity', 'fossil_energy_per_capita', 'fossil_fuel_consumption', 'fossil_share_elec', 'fossil_share_energy', 'gas_cons_change_pct', 'gas_cons_change_twh', 'gas_consumption', 'gas_elec_per_capita', 'g

In [24]:
# Display unique countries represented by the data
pd.DataFrame(df['country'].unique())

Unnamed: 0,0
0,Afghanistan
1,Africa
2,Africa (BP)
3,Africa (Shift)
4,Albania
...,...
309,World
310,Yemen
311,Yugoslavia
312,Zambia


### Observations

Right off the bat we can see that this is a very in-depth dataset. It is very large and contains detailed information on many different forms of energy use. However, there are many fields being displayed as *NaN*, so while we can get a glimpse of what the data contains, there is clearly some cleaning and wrangling that needs to be done. Also noteworthy is the presence of "World" in the `country` column, representing global energy data.

## Data Analysis Pipeline

### Load, Clean, Process, Wrangle Data

In [25]:
df = pd.read_csv('../data/raw/owid-energy-data.csv')

In [None]:
# Milestone 3 incomplete