# 🏭Global Power Plant Data

`Tynan Purdy`

## ⏬Imports and Major Variables

In [101]:
import numpy as np
import pandas as pd

generation_path = "data\global-power-plants\global_power_plant_database.csv"
emission_path = "data\global-power-plants\global_power_emissions_database.xlsx"

### File Import to DataFrame

Import our power plant generation database

In [102]:
with open(generation_path, encoding='utf8') as fin:
    ppg = pd.read_csv(fin, low_memory=False)

ppg.head()

Unnamed: 0,country,country_long,name,gppd_idnr,capacity_mw,latitude,longitude,primary_fuel,other_fuel1,other_fuel2,...,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017,estimated_generation_note_2013,estimated_generation_note_2014,estimated_generation_note_2015,estimated_generation_note_2016,estimated_generation_note_2017
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,GEODB0040538,33.0,32.322,65.119,Hydro,,,...,123.77,162.9,97.39,137.76,119.5,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
1,AFG,Afghanistan,Kandahar DOG,WKS0070144,10.0,31.67,65.795,Solar,,,...,18.43,17.48,18.25,17.7,18.29,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
2,AFG,Afghanistan,Kandahar JOL,WKS0071196,10.0,31.623,65.792,Solar,,,...,18.64,17.58,19.1,17.62,18.72,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE,SOLAR-V1-NO-AGE
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,GEODB0040541,66.0,34.556,69.4787,Hydro,,,...,225.06,203.55,146.9,230.18,174.91,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,GEODB0040534,100.0,34.641,69.717,Hydro,,,...,406.16,357.22,270.99,395.38,350.8,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1,HYDRO-V1


Import our power plant emissions database

In [103]:
ppe = pd.read_excel(emission_path, sheet_name='GPED_v1.0_Plant Level', skiprows=0, header=1)

ppe.head()

Unnamed: 0,No.,Country (or Region),Plant Name,Number of Units,Total Plant Installed Capacity (MW),Fuel Types,CO2 Emissions (Mg),SO2 Emissions (Mg),NOx Emissions (Mg),PM2.5 Emissions (Mg)
0,1,AFGHANISTAN,JARQODOQ,6,15.0,NG,74478.17871,0.344389,141.061784,1.033167
1,2,AFGHANISTAN,KHOJA GOGIRDAK FIELD,2,4.0,NG,19862.281716,0.091844,37.619192,0.275531
2,3,AFGHANISTAN,KODE BARQ FACTORY,4,48.0,NG,236976.075524,1.095784,448.833049,3.287351
3,4,AFGHANISTAN,BAGHDIS,2,1.224,OIL,2848.936334,35.611704,6.588165,0.445146
4,5,AFGHANISTAN,BAMYAN,1,0.572,OIL,1331.71116,16.646389,3.079582,0.20808


Was having trouble opening the file due to an encode error that seemed to be triggered by a particular character. Code below is designed to locate that character.

Error turned out to be that the original file object needed to specify its encoding format, not at the dataframe declaration.

## ❌Removing Unwanted Data

In [104]:
unwanted_columns = ['latitude',
                    'longitude',
                    'other_fuel1',
                    'other_fuel2',
                    'other_fuel3',
                    'commissioning_year',
                    'gppd_idnr',
                    'owner',
                    'source',
                    'url',
                    'geolocation_source',
                    'wepp_id',
                    'year_of_capacity_data',
                    'generation_data_source',
                    'generation_gwh_2018',
                    'generation_gwh_2019',
                    'estimated_generation_note_2013',
                    'estimated_generation_note_2014',
                    'estimated_generation_note_2015',
                    'estimated_generation_note_2016',
                    'estimated_generation_note_2017']
ppg.drop(unwanted_columns, axis=1, inplace=True)
ppg.head()

Unnamed: 0,country,country_long,name,capacity_mw,primary_fuel,generation_gwh_2013,generation_gwh_2014,generation_gwh_2015,generation_gwh_2016,generation_gwh_2017,estimated_generation_gwh_2013,estimated_generation_gwh_2014,estimated_generation_gwh_2015,estimated_generation_gwh_2016,estimated_generation_gwh_2017
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,33.0,Hydro,,,,,,123.77,162.9,97.39,137.76,119.5
1,AFG,Afghanistan,Kandahar DOG,10.0,Solar,,,,,,18.43,17.48,18.25,17.7,18.29
2,AFG,Afghanistan,Kandahar JOL,10.0,Solar,,,,,,18.64,17.58,19.1,17.62,18.72
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,66.0,Hydro,,,,,,225.06,203.55,146.9,230.18,174.91
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,100.0,Hydro,,,,,,406.16,357.22,270.99,395.38,350.8


In [105]:
unwanted_columns = ['No.', 'Number of Units', 'Total Plant Installed Capacity (MW)']
ppe.drop(unwanted_columns, axis=1, inplace=True)
ppe.head()

Unnamed: 0,Country (or Region),Plant Name,Fuel Types,CO2 Emissions (Mg),SO2 Emissions (Mg),NOx Emissions (Mg),PM2.5 Emissions (Mg)
0,AFGHANISTAN,JARQODOQ,NG,74478.17871,0.344389,141.061784,1.033167
1,AFGHANISTAN,KHOJA GOGIRDAK FIELD,NG,19862.281716,0.091844,37.619192,0.275531
2,AFGHANISTAN,KODE BARQ FACTORY,NG,236976.075524,1.095784,448.833049,3.287351
3,AFGHANISTAN,BAGHDIS,OIL,2848.936334,35.611704,6.588165,0.445146
4,AFGHANISTAN,BAMYAN,OIL,1331.71116,16.646389,3.079582,0.20808


## 📊Aggregating Data

### ⚡Global Power Generation

There are two sets of columns with generation data: `generation_gwh_[YEAR]` and `estimated_generation_[YEAR]`. We will make a separate average column for each source for now and then choose the one that isn't empty or somehow decide between them if they are both present.

In [106]:
avgs = ['generation_gwh_2013',
        'generation_gwh_2014',
        'generation_gwh_2015',
        'generation_gwh_2016',
        'generation_gwh_2017']
ppg['AVG_GENERATION'] = ppg[avgs].mean(axis=1)
ppg['AVG_GENERATION'][500:510]

500           NaN
501      1.202778
502      5.124444
503           NaN
504           NaN
505           NaN
506    177.273944
507           NaN
508      0.528722
509           NaN
Name: AVG_GENERATION, dtype: float64

In [107]:
avgs = ['estimated_generation_gwh_2013',
        'estimated_generation_gwh_2014',
        'estimated_generation_gwh_2015',
        'estimated_generation_gwh_2016',
        'estimated_generation_gwh_2017']
ppg['AVG_EST_GENERATION'] = ppg[avgs].mean(axis=1)
ppg['AVG_EST_GENERATION'].head()

0    128.264
1     18.030
2     18.332
3    196.120
4    356.110
Name: AVG_EST_GENERATION, dtype: float64

Now we will merge the two average columns into a single column and remove any rows that have no generation data.

In [108]:
ppg['GENERATION_MW'] = ppg.apply(lambda x : np.fmax(x['AVG_GENERATION'], x['AVG_EST_GENERATION']), axis=1)
ppg[['GENERATION_MW', 'AVG_GENERATION', 'AVG_EST_GENERATION']].head()

Unnamed: 0,GENERATION_MW,AVG_GENERATION,AVG_EST_GENERATION
0,128.264,,128.264
1,18.03,,18.03
2,18.332,,18.332
3,196.12,,196.12
4,356.11,,356.11


Remove unaggregated columns now no longer needed

In [109]:
unwanted_columns = ['AVG_GENERATION',
                    'AVG_EST_GENERATION',
                    'generation_gwh_2013',
                    'generation_gwh_2014',
                    'generation_gwh_2015',
                    'generation_gwh_2016',
                    'generation_gwh_2017',
                    'estimated_generation_gwh_2013',
                    'estimated_generation_gwh_2014',
                    'estimated_generation_gwh_2015',
                    'estimated_generation_gwh_2016',
                    'estimated_generation_gwh_2017']
ppg.drop(unwanted_columns, axis=1, inplace=True)
ppg.head()

Unnamed: 0,country,country_long,name,capacity_mw,primary_fuel,GENERATION_MW
0,AFG,Afghanistan,Kajaki Hydroelectric Power Plant Afghanistan,33.0,Hydro,128.264
1,AFG,Afghanistan,Kandahar DOG,10.0,Solar,18.03
2,AFG,Afghanistan,Kandahar JOL,10.0,Solar,18.332
3,AFG,Afghanistan,Mahipar Hydroelectric Power Plant Afghanistan,66.0,Hydro,196.12
4,AFG,Afghanistan,Naghlu Dam Hydroelectric Power Plant Afghanistan,100.0,Hydro,356.11


In [110]:
generation_dist = ppg.groupby('primary_fuel')['GENERATION_MW'].sum().sort_values(ascending=False)
generation_dist

primary_fuel
Coal              9.960694e+06
Gas               6.304916e+06
Hydro             3.755360e+06
Nuclear           2.943568e+06
Wind              7.094421e+05
Oil               5.360917e+05
Solar             3.486002e+05
Geothermal        6.083382e+04
Waste             5.570859e+04
Biomass           3.368912e+04
Other             1.761465e+04
Petcoke           7.943088e+03
Cogeneration      3.271753e+03
Storage           9.778734e+02
Wave and Tidal    0.000000e+00
Name: GENERATION_MW, dtype: float64

### 💨Global Emissions

In [111]:
emission_dist = ppe.groupby('Fuel Types').aggregate({'CO2 Emissions (Mg)':'sum','SO2 Emissions (Mg)':'sum','NOx Emissions (Mg)':'sum','PM2.5 Emissions (Mg)':'sum'})
emission_dist

Unnamed: 0_level_0,CO2 Emissions (Mg),SO2 Emissions (Mg),NOx Emissions (Mg),PM2.5 Emissions (Mg)
Fuel Types,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
BIOMASS,119606700.0,179237.6,174493.0,37133.16
COAL,8880799000.0,29734190.0,18480380.0,2508320.0
NG,2518846000.0,52287.39,3440040.0,41447.6
OIL,737013900.0,8689335.0,2723445.0,93044.16
OTHER,274978500.0,150082.7,338207.3,10422.5


## 🙏Combining Tables

Create a master table of power sources and their corresponding emissions.

First we must match the indexes of both tables with string modifications and grouping.

In [112]:
generation_dist.index = generation_dist.index.str.upper()
generation_dist.drop('WAVE AND TIDAL', axis=0, inplace=True)

gen_other = ['PETCOKE','WASTE','COGENERATION','STORAGE','NUCLEAR']
generation_dist['OTHER'] = generation_dist[gen_other].sum(axis=0)
generation_dist.drop(gen_other, axis=0, inplace=True)

emission_dist = emission_dist.rename(index={'NG':'GAS'})
print(emission_dist.index)
print(generation_dist.index)

Index(['BIOMASS', 'COAL', 'GAS', 'OIL', 'OTHER'], dtype='object', name='Fuel Types')
Index(['COAL', 'GAS', 'HYDRO', 'WIND', 'OIL', 'SOLAR', 'GEOTHERMAL', 'BIOMASS',
       'OTHER'],
      dtype='object', name='primary_fuel')


In [113]:
power = pd.concat([generation_dist, emission_dist], axis=1)
power = power.fillna(0)
power

Unnamed: 0,GENERATION_MW,CO2 Emissions (Mg),SO2 Emissions (Mg),NOx Emissions (Mg),PM2.5 Emissions (Mg)
COAL,9960694.0,8880799000.0,29734190.0,18480380.0,2508320.0
GAS,6304916.0,2518846000.0,52287.39,3440040.0,41447.6
HYDRO,3755360.0,0.0,0.0,0.0,0.0
WIND,709442.1,0.0,0.0,0.0,0.0
OIL,536091.7,737013900.0,8689335.0,2723445.0,93044.16
SOLAR,348600.2,0.0,0.0,0.0,0.0
GEOTHERMAL,60833.82,0.0,0.0,0.0,0.0
BIOMASS,33689.12,119606700.0,179237.6,174493.0,37133.16
OTHER,3011469.0,274978500.0,150082.7,338207.3,10422.5
