# Assignment 2 - Pandas

**Tanvir, Ahmed, 20075186**

## The Story

Use Markdown cells to write a brief summary of the data analysis you are planning to undertake:

  - What is the goal of this work?
    
  - What kind of data is analyzed in this work? 
    
  - What summary statistics are obtained in this work?

  
This part is worth 3 marks. I recommend writing this part once you have completed all the remaining parts of this assignment.

#### Brief introduction and objective of the analysis

1. Constructed 2 dataframes using World Bank data.

2. The first DataFrame: 'dataframe_1' describes the Economy of the selected countries.

3. The second DataFrame: 'dataframe_2' describes the Energy consumption of the countries, access to electricity, how electricty is produced (fossil, renewable).

4. The goal was to identify which factors has the most correlation with renewable energy usage or shifts towards it.

Please note: that initially a few extra indicators were chosen to get a feel of the economy and the energy consumption but all of them weren't used in the analysis.

The data analysed was numeric, structured with proper labels.

**Summary**

No correlation was found between strength or size of an economy and the dependency on renewable sources for electricity generation.

France and Canada are utilising green sources the most from my country list and European countries have more inclination towards renewable energy. 

## Data Preparation

### Countries

In [1]:
# Codes for the chosen countries

country_codes = ["CAN", "CHN", "DEU", "EGY", "FRA", "GBR", "IND", "JPN", "NGA", "USA", "ZAF"]

In [2]:
# Creating a dictionary in the country code:country name format:

# Step 1: Creating a list of country names in the same order as the country_codes list

country_proper_names = ['Canada', 'China', 'Egypt', 'France', 'Germany', 'India', 'Japan', 'Nigeria', 
                        'South Africa', 'United Kingdom', 'United States']

country_names = {}
for i in range(0,len(country_codes)):
    country_names[country_codes[i]] = country_proper_names[i]

# The final dictionary
country_names

{'CAN': 'Canada',
 'CHN': 'China',
 'DEU': 'Egypt',
 'EGY': 'France',
 'FRA': 'Germany',
 'GBR': 'India',
 'IND': 'Japan',
 'JPN': 'Nigeria',
 'NGA': 'South Africa',
 'USA': 'United Kingdom',
 'ZAF': 'United States'}

In [3]:
# Grouping the countries in their respective continents in a dictionary

country_groups = {'EGY':'Africa', 'NGA':'Africa', 'ZAF':'Africa', 'CHN':'Asia', 'IND':'Asia', 'JPN':'Asia', 
                  'FRA':'Europe', 'DEU':'Europe', 'GBR':'Europe', 'CAN':'North America', 'USA': 'North America'}

country_groups

{'EGY': 'Africa',
 'NGA': 'Africa',
 'ZAF': 'Africa',
 'CHN': 'Asia',
 'IND': 'Asia',
 'JPN': 'Asia',
 'FRA': 'Europe',
 'DEU': 'Europe',
 'GBR': 'Europe',
 'CAN': 'North America',
 'USA': 'North America'}

### Indicators

In [4]:
import numpy as np
import pandas as pd
import wbgapi as wb
import matplotlib.pyplot as plt
%matplotlib inline

In [5]:
# Creating a list of Indicator IDs for my first DataFrame

indicator_ids_1 = ['SP.POP.TOTL', 'SL.TLF.TOTL.IN', 'NY.GDP.MKTP.CD', 'NY.GDP.MKTP.KD.ZG', 'GC.DOD.TOTL.GD.ZS', 
                   'FI.RES.TOTL.CD', 'BX.GSR.GNFS.CD', 'BM.GSR.GNFS.CD']

# Indicator IDs (indicator_ids_2) for my second DataFrame is done in a similar method

### DataFrames

In [6]:
# Creating a Pandas DataFrame from World Bank data

my_dataframe_1 = wb.data.DataFrame(indicator_ids_1, country_codes, time=range(2011, 2016)) 

#replacing most recent 5 years mrv=5 with time for chosen years

df = my_dataframe_1.unstack().stack(level=0) # using unstack and stack method to get the dataframe to my desired shape

# unstack() takes the indicators from being subcategories in the rows under country names to subcategories of year columns

# applying stack() again on level = 0 takes the year columns to a sublevel of rows

# How it looks
df.head(10)

Unnamed: 0_level_0,series,BM.GSR.GNFS.CD,BX.GSR.GNFS.CD,FI.RES.TOTL.CD,GC.DOD.TOTL.GD.ZS,NY.GDP.MKTP.CD,NY.GDP.MKTP.KD.ZG,SL.TLF.TOTL.IN,SP.POP.TOTL
economy,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
CAN,YR2011,568459600000.0,546777000000.0,65818990000.0,,1793327000000.0,3.146881,19147395.0,34339330.0
CAN,YR2012,589479800000.0,554961500000.0,68546340000.0,,1828366000000.0,1.762223,19322866.0,34714220.0
CAN,YR2013,589064600000.0,560082500000.0,71937090000.0,,1846597000000.0,2.329123,19546552.0,35082950.0
CAN,YR2014,589626500000.0,573305500000.0,74699960000.0,,1805750000000.0,2.870036,19629145.0,35437440.0
CAN,YR2015,534721000000.0,496137300000.0,79753520000.0,,1556509000000.0,0.659177,19747709.0,35702910.0
CHN,YR2011,1826949000000.0,2008852000000.0,3254674000000.0,,7551500000000.0,9.550832,778977720.0,1345035000.0
CHN,YR2012,1943247000000.0,2175092000000.0,3387513000000.0,,8532230000000.0,7.863736,782865417.0,1354190000.0
CHN,YR2013,2120215000000.0,2355595000000.0,3880368000000.0,,9570406000000.0,7.76615,786673270.0,1363240000.0
CHN,YR2014,2241603000000.0,2462902000000.0,3900039000000.0,,10475680000000.0,7.425764,791323527.0,1371860000.0
CHN,YR2015,2002282000000.0,2360152000000.0,3405253000000.0,,11061550000000.0,7.041329,795251107.0,1379860000.0


In [7]:
# Multiindexing the rows

dataframe_1 = df.iloc[:, ::-1]

index = pd.MultiIndex.from_product([country_codes, [2011, 2012, 2013, 2014, 2015]],
                                   names=['Country', 'Year'])

dataframe_1.index = index

In [8]:
# Multiindexing the columns
dataframe_1.columns = pd.MultiIndex.from_tuples([('Population', 'Total'), ('Population', 'Total labor force'), 
                                                 ('GDP','Growth (annual %)'), ('GDP', 'Gross (USD)'),
                                                 ('Economic strength', 'Central government debt (% of GDP)'), 
                                                 ('Economic strength', 'Total reserves (USD)'), 
                                                 ('Commerce', 'Exports (USD)'), ('Commerce', 'Imports (USD)')])

In [9]:
# Re-arranging the columns using a variable called 't'
t = list(dataframe_1.columns)     # creates a list of column names 

# I want to swap positions of column 3 and 4 
t[2], t[3] = t[3], t[2]

dataframe_1 = dataframe_1[t]

"""
This can also be manually done as below.

dataframe_1 = dataframe_1[[('Population', 'Total'), ('Population', 'Total labor force'),
                           ('GDP', 'Gross (USD)'), ('GDP','Growth (annual %)'), 
                           ('Economic strength', 'Central government debt (% of GDP)'), 
                           ('Economic strength', 'Total reserves (USD)'), 
                           ('Commerce', 'Exports (USD)'), ('Commerce', 'Imports (USD)')]]
"""

"\nThis can also be manually done as below.\n\ndataframe_1 = dataframe_1[[('Population', 'Total'), ('Population', 'Total labor force'),\n                           ('GDP', 'Gross (USD)'), ('GDP','Growth (annual %)'), \n                           ('Economic strength', 'Central government debt (% of GDP)'), \n                           ('Economic strength', 'Total reserves (USD)'), \n                           ('Commerce', 'Exports (USD)'), ('Commerce', 'Imports (USD)')]]\n"

In [10]:
# My first DataFrame

dataframe_1.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,Population,GDP,GDP,Economic strength,Economic strength,Commerce,Commerce
Unnamed: 0_level_1,Unnamed: 1_level_1,Total,Total labor force,Gross (USD),Growth (annual %),Central government debt (% of GDP),Total reserves (USD),Exports (USD),Imports (USD)
Country,Year,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
CAN,2011,34339330.0,19147395.0,1793327000000.0,3.146881,,65818990000.0,546777000000.0,568459600000.0
CAN,2012,34714220.0,19322866.0,1828366000000.0,1.762223,,68546340000.0,554961500000.0,589479800000.0
CAN,2013,35082950.0,19546552.0,1846597000000.0,2.329123,,71937090000.0,560082500000.0,589064600000.0
CAN,2014,35437440.0,19629145.0,1805750000000.0,2.870036,,74699960000.0,573305500000.0,589626500000.0
CAN,2015,35702910.0,19747709.0,1556509000000.0,0.659177,,79753520000.0,496137300000.0,534721000000.0
CHN,2011,1345035000.0,778977720.0,7551500000000.0,9.550832,,3254674000000.0,2008852000000.0,1826949000000.0
CHN,2012,1354190000.0,782865417.0,8532230000000.0,7.863736,,3387513000000.0,2175092000000.0,1943247000000.0
CHN,2013,1363240000.0,786673270.0,9570406000000.0,7.76615,,3880368000000.0,2355595000000.0,2120215000000.0
CHN,2014,1371860000.0,791323527.0,10475680000000.0,7.425764,,3900039000000.0,2462902000000.0,2241603000000.0
CHN,2015,1379860000.0,795251107.0,11061550000000.0,7.041329,,3405253000000.0,2360152000000.0,2002282000000.0


In [11]:
# The dataframe has a column with NaN values (there are a few inputs in this column though, let's see if it'll be useful). 

# Please note: 
# some of the columns will be excluded in this analysis, they're just presented for informational purposes 
# and possible exploratory data analysis

# I'll keep this dataframe as is for now and drop the columns not necessary as we as the NaN column when needed.

In [12]:

# Creating the second DataFrame following the same steps as above

indicator_ids_2 = ['SP.POP.TOTL', 'EG.ELC.ACCS.ZS', 'EG.USE.ELEC.KH.PC', 'EG.ELC.LOSS.ZS', 
                   'EG.ELC.FOSL.ZS', 'EG.ELC.RNWX.ZS', 'EG.ELC.HYRO.ZS', 'EG.ELC.NUCL.ZS']

my_dataframe_2 = wb.data.DataFrame(indicator_ids_2, country_codes, time=range(2011, 2016))

df2 = my_dataframe_2.unstack().stack(level=0) # using unstack and stack method to get the dataframe to my desired shape

dataframe_2 = df2.iloc[:, ::-1]

index_2 = pd.MultiIndex.from_product([country_codes, [2011, 2012, 2013, 2014, 2015]],
                                   names=['Country', 'Year'])

dataframe_2.index = index_2

# How the dataframe looks like
dataframe_2.head(10)

Unnamed: 0_level_0,series,SP.POP.TOTL,EG.USE.ELEC.KH.PC,EG.ELC.RNWX.ZS,EG.ELC.NUCL.ZS,EG.ELC.LOSS.ZS,EG.ELC.HYRO.ZS,EG.ELC.FOSL.ZS,EG.ELC.ACCS.ZS
Country,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
CAN,2011,34339330.0,15644.540278,3.298016,14.707805,8.800576,59.040234,22.543932,100.0
CAN,2012,34714220.0,15336.624857,3.507259,14.900157,8.438532,59.723302,21.424611,100.0
CAN,2013,35082950.0,15750.811633,4.409954,15.549008,8.466956,58.888079,20.769341,100.0
CAN,2014,35437440.0,15588.487146,5.570376,16.119075,8.711767,57.254617,20.763125,100.0
CAN,2015,35702910.0,,6.267257,15.546561,,56.744193,21.06718,100.0
CHN,2011,1345035000.0,3295.784868,2.13764,1.835336,5.740233,14.62413,81.174003,99.848724
CHN,2012,1354190000.0,3466.019539,2.657515,1.953846,5.810062,17.308734,77.859893,99.961929
CHN,2013,1363240000.0,3757.185088,3.564878,2.053005,5.77701,16.731349,77.424467,99.996445
CHN,2014,1371860000.0,3905.317598,4.05666,2.339286,5.471266,18.552494,74.822887,100.0
CHN,2015,1379860000.0,,4.857004,,,19.069813,72.962076,100.0


In [13]:
# Multiindexing the columns again

dataframe_2.columns = pd.MultiIndex.from_tuples([('Population', 'Total'), 
                                                 ('Electricity T&D', 'Electricity consumption (kWh/capita)'), 
                                                 ('Electricity production source (% of total)','Solar & Wind'), 
                                                 ('Electricity production source (% of total)','Nuclear'),
                                                 ('Electricity T&D', 'Trans & Dist loss (% of output)'), 
                                                 ('Electricity production source (% of total)','Hydro'), 
                                                 ('Electricity production source (% of total)','Fossil fuels'), 
                                                 ('Population', 'Access to electricity (% of population)')])

dataframe_2 = dataframe_2[[('Population', 'Total'),
                           ('Population', 'Access to electricity (% of population)'),
                            ('Electricity T&D', 'Electricity consumption (kWh/capita)'),
                            ('Electricity T&D', 'Trans & Dist loss (% of output)'),
                            ('Electricity production source (% of total)','Solar & Wind'), 
                            ('Electricity production source (% of total)','Nuclear'),
                            ('Electricity production source (% of total)','Hydro'),
                            ('Electricity production source (% of total)','Fossil fuels')]]

dataframe_2.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Population,Population,Electricity T&D,Electricity T&D,Electricity production source (% of total),Electricity production source (% of total),Electricity production source (% of total),Electricity production source (% of total)
Unnamed: 0_level_1,Unnamed: 1_level_1,Total,Access to electricity (% of population),Electricity consumption (kWh/capita),Trans & Dist loss (% of output),Solar & Wind,Nuclear,Hydro,Fossil fuels
Country,Year,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
CAN,2011,34339330.0,100.0,15644.540278,8.800576,3.298016,14.707805,59.040234,22.543932
CAN,2012,34714220.0,100.0,15336.624857,8.438532,3.507259,14.900157,59.723302,21.424611
CAN,2013,35082950.0,100.0,15750.811633,8.466956,4.409954,15.549008,58.888079,20.769341
CAN,2014,35437440.0,100.0,15588.487146,8.711767,5.570376,16.119075,57.254617,20.763125
CAN,2015,35702910.0,100.0,,,6.267257,15.546561,56.744193,21.06718
CHN,2011,1345035000.0,99.848724,3295.784868,5.740233,2.13764,1.835336,14.62413,81.174003
CHN,2012,1354190000.0,99.961929,3466.019539,5.810062,2.657515,1.953846,17.308734,77.859893
CHN,2013,1363240000.0,99.996445,3757.185088,5.77701,3.564878,2.053005,16.731349,77.424467
CHN,2014,1371860000.0,100.0,3905.317598,5.471266,4.05666,2.339286,18.552494,74.822887
CHN,2015,1379860000.0,100.0,,,4.857004,,19.069813,72.962076


## Data Analysis 

Use Pandas ``groupby()`` and ``pivot_table()`` methods to construct 8 different summary statistics. They must include the following Pandas techniques:

- ``groupby()`` combined with ``aggregate()``, ``filter()``, ``transform()``, and ``apply()`` methods.


- ``groupby()`` using an external key, the dictionary ``country_groups`` you have constructed above.


- at least one summary statistics must use the ``pivot_table()`` method. 


- at least two summary statistics must use data from both DataFrames.

The necessary Pandas techniques are explained in Notebooks 2.8 and 2.9.

**Important:** Make sure your summary statistics make sense and tell a story. This story must be summarized in the first part of this assignment, "The Story".


This part is worth 10 marks: 1 mark for Python code for each summary statistic and 2 marks for comments explaining the Python code and the summary statistics.

In [14]:
# Application of groupby

gdp_max = dataframe_1.groupby(level='Country')[[('GDP', 'Gross (USD)')]].mean()
gdp_max.sort_values(by=[('GDP', 'Gross (USD)')], ascending=False)

Unnamed: 0_level_0,GDP
Unnamed: 0_level_1,Gross (USD)
Country,Unnamed: 1_level_2
USA,16857980000000.0
CHN,9438274000000.0
JPN,5411953000000.0
DEU,3651388000000.0
GBR,2848216000000.0
FRA,2731172000000.0
IND,1930025000000.0
CAN,1766110000000.0
NGA,480533600000.0
ZAF,404279300000.0


***USA, China and Japan had higher avearge GDP than the rest of the countries between 2011 and 2015***

In [15]:
# GDP growth of the countries using groupby.filter()

# Filter by average GDP growth more than 5%

#dataframe_1.groupby('Country').filter(lambda x: x[('GDP','Growth (annual %)')].mean() > 3)

growth = dataframe_1.groupby('Country').filter(lambda x: x[('GDP','Growth (annual %)')].mean() > 5)

growth.groupby('Country')[[('GDP','Growth (annual %)')]].mean().sort_values([('GDP','Growth (annual %)')], ascending=False)

Unnamed: 0_level_0,GDP
Unnamed: 0_level_1,Growth (annual %)
Country,Unnamed: 1_level_2
CHN,7.929562
IND,6.498058
NGA,5.034347


**China's GDP was the fastest growing between 2011-2015**

In [16]:
# The percentage of the labor force in the countries apply() method

df_3 = dataframe_1[[('Population', 'Total')]].droplevel(level=0, axis=1)
df_3 = df_3.reset_index()

df_4 = dataframe_1[[('Population', 'Total labor force')]].droplevel(level=0, axis=1)
df_4 = df_4.reset_index()

def ratio(x):
    x['Total labor force'] /= df_3['Total']/100
    return x

labor_force_percentage = df_4.groupby('Country').apply(ratio)

labor_force_percentage.columns = ['Country', 'Year', 'labor force percetage']

labor_force_percentage.groupby('Country')['labor force percetage'].mean()


Country
CAN    55.567898
CHN    57.749414
DEU    52.180651
EGY    32.302633
FRA    45.881623
GBR    51.592087
IND    36.040963
JPN    51.508466
NGA    31.700477
USA    50.113420
ZAF    37.488959
Name: labor force percetage, dtype: float64

**China, Canada, Germany, United Kingdom, Japan and the US have more than 50 percent of their population into the workforce**

In [17]:
# Application of groupby.aggregate method

export_by_countries = dataframe_1.groupby(level='Country')[[('Commerce', 'Exports (USD)')]].aggregate(['min', 
                                                                                                       np.mean, 
                                                                                                       max])
export_by_countries.sort_values(by=[('Commerce', 'Exports (USD)', 'max')], ascending=False)

Unnamed: 0_level_0,Commerce,Commerce,Commerce
Unnamed: 0_level_1,Exports (USD),Exports (USD),Exports (USD)
Unnamed: 0_level_2,min,mean,max
Country,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3
CHN,2008852000000.0,2272519000000.0,2462902000000.0
USA,2143556000000.0,2275359000000.0,2392613000000.0
DEU,1575247000000.0,1672834000000.0,1773618000000.0
JPN,784710800000.0,864459500000.0,930660400000.0
GBR,803703000000.0,828452500000.0,867943000000.0
FRA,777544700000.0,817320000000.0,853503000000.0
CAN,496137300000.0,546252700000.0,573305500000.0
IND,428630900000.0,454541600000.0,485583000000.0
ZAF,96346520000.0,113061100000.0,126935000000.0
NGA,49047770000.0,86803040000.0,102437500000.0


In [18]:
import_by_countries = dataframe_1.groupby(level='Country')[[('Commerce', 'Imports (USD)')]].aggregate(['min', 
                                                                                                       np.mean, 
                                                                                                       max])
import_by_countries.sort_values(by=[('Commerce', 'Imports (USD)', 'max')], ascending=False)

Unnamed: 0_level_0,Commerce,Commerce,Commerce
Unnamed: 0_level_1,Imports (USD),Imports (USD),Imports (USD)
Unnamed: 0_level_2,min,mean,max
Country,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3
USA,2698073000000.0,2775889000000.0,2876564000000.0
CHN,1826949000000.0,2026859000000.0,2241603000000.0
DEU,1320209000000.0,1446217000000.0,1515877000000.0
JPN,807986700000.0,948009900000.0,1014813000000.0
GBR,847531000000.0,867950900000.0,922197600000.0
FRA,787378200000.0,851547800000.0,889731800000.0
CAN,534721000000.0,574270300000.0,589626500000.0
IND,491880100000.0,547762400000.0,579908600000.0
ZAF,100802800000.0,117217300000.0,123559500000.0
NGA,71947440000.0,81343360000.0,90793630000.0


In [19]:
profit = export_by_countries[('Commerce', 'Exports (USD)', 'mean')] - import_by_countries[('Commerce', 'Imports (USD)', 'mean')]
profit.sort_values(ascending=False)

Country
CHN    2.456596e+11
DEU    2.266173e+11
NGA    5.459673e+09
ZAF   -4.156178e+09
EGY   -2.245146e+10
CAN   -2.801757e+10
FRA   -3.422773e+10
GBR   -3.949841e+10
JPN   -8.355042e+10
IND   -9.322084e+10
USA   -5.005302e+11
dtype: float64

**China, USA and Germany are the top 3 exporters and importers of good and services among the countries.**

**China, Germany and Nigeria are making profits.**

In [20]:
# Total reserves by continents using dataframe_1

# To groupby() using the dictionary country_groups (an external key) I need to reset the index of the multi-indexed dataframe.

reset_df1 = dataframe_1.reset_index()

#print(reset_df)

# Now setting the index to the newly created column 'Country' assigning the value to a new dataframe
country_idx_df_1 = reset_df1.set_index(['Country'])

#reset_df_groupby = reset_df.groupby(country_groups)[[('Economic strength', 'Total reserves (USD)')]].sum()
#reset_df_groupby

#calling groupby on the new dataframe q to use country_groups

reserves_by_continent = country_idx_df_1.groupby(country_groups)[[('Economic strength', 'Total reserves (USD)')]].sum()

reserves_by_continent

Unnamed: 0_level_0,Economic strength
Unnamed: 0_level_1,Total reserves (USD)
Country,Unnamed: 1_level_2
Africa,507628400000.0
Asia,25728060000000.0
Europe,2447299000000.0
North America,2738945000000.0


In [21]:
reserves_list = dataframe_1.groupby('Country')[[('Economic strength', 'Total reserves (USD)')]].max()

reserves_list.sort_values(by=[('Economic strength', 'Total reserves (USD)')], ascending=False)

Unnamed: 0_level_0,Economic strength
Unnamed: 0_level_1,Total reserves (USD)
Country,Unnamed: 1_level_2
CHN,3900039000000.0
JPN,1295839000000.0
USA,574268100000.0
IND,353319100000.0
DEU,248856500000.0
FRA,184521800000.0
GBR,148109300000.0
CAN,79753520000.0
ZAF,50688080000.0
NGA,43830640000.0


**The selected Asian countries have the highest reserves than the rest due to China and Japan having the most amount of reserves at the top 2 position on the table.**

In [22]:
# Which continents were utilising the most solar and wind [dataframe_2]?

country_idx_df_2 = dataframe_2.reset_index()

country_idx_df_2 = country_idx_df_2.set_index(['Country'])

# Groupby below using country_groups 

solar_wind = country_idx_df_2.groupby(country_groups)[[('Electricity production source (% of total)','Solar & Wind')]].mean()

solar_wind

Unnamed: 0_level_0,Electricity production source (% of total)
Unnamed: 0_level_1,Solar & Wind
Country,Unnamed: 1_level_2
Africa,0.536075
Asia,4.535155
Europe,13.502376
North America,5.393571


**Europe has larger proportion of its electricty production by Solar and Wind energies.**

In [23]:
# Which countries used more renewable sources to generate electricity than fossil fuels?

renewable = dataframe_2[[('Electricity production source (% of total)','Solar & Wind'), 
                                 ('Electricity production source (% of total)','Nuclear'), 
                                 ('Electricity production source (% of total)','Hydro')]]


renewable[('Electricity production source (% of total)','Fossil')] = dataframe_2[[('Electricity production source (% of total)', 
                                                                                   'Fossil fuels')]]
#removing multi-index from the columns

renewable = renewable.droplevel(level=0, axis=1)


renewable = renewable.reset_index()

renewable['Sum of renewable (% of total electricity prod.)'] = renewable[['Solar & Wind', 'Nuclear', 'Hydro']].sum(axis=1)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  renewable[('Electricity production source (% of total)','Fossil')] = dataframe_2[[('Electricity production source (% of total)',


In [24]:
renewable.set_index('Country')

renewable.groupby('Country')[['Sum of renewable (% of total electricity prod.)']].mean().sort_values(by=
                              ['Sum of renewable (% of total electricity prod.)'], ascending=False)


Unnamed: 0_level_0,Sum of renewable (% of total electricity prod.)
Country,Unnamed: 1_level_1
FRA,92.360655
CAN,78.305179
DEU,40.370703
GBR,35.661474
USA,31.704813
CHN,22.348338
NGA,19.140297
IND,18.718404
JPN,15.535208
EGY,8.843665


**France and Canada generate most of their electricity from renewable sources**

In [25]:
# Using pivot_table() method

fossil_fuel_use = dataframe_2.droplevel(level=0 ,axis=1).pivot_table('Fossil fuels', index='Country', aggfunc='mean')

fossil_fuel_use.sort_values(by='Fossil fuels', ascending=False)

Unnamed: 0_level_0,Fossil fuels
Country,Unnamed: 1_level_1
ZAF,93.674061
EGY,91.156335
NGA,80.859703
IND,80.685692
JPN,79.925226
CHN,76.848665
USA,67.9292
GBR,63.655315
DEU,58.223146
CAN,21.313638


**Most countries rely very heavily on fossil fuels for electricity production with the only exception of Canada and France as seen in the previous summary.**

In [26]:
# Population vs Energy Consumption

df_comparison_1 = dataframe_1.loc[:, ('Population', 'Total')]
comparison_1 = pd.DataFrame(df_comparison_1).join(pd.DataFrame(dataframe_2.loc[:, ('Electricity T&D', 
                                                                    'Electricity consumption (kWh/capita)')]))

comp = comparison_1.groupby('Country')[[('Population', 'Total'), 
                                        ('Electricity T&D', 'Electricity consumption (kWh/capita)')]].max()

comp = comp.sort_values(('Electricity T&D', 'Electricity consumption (kWh/capita)'), ascending=False)

comp

Unnamed: 0_level_0,Population,Electricity T&D
Unnamed: 0_level_1,Total,Electricity consumption (kWh/capita)
Country,Unnamed: 1_level_2,Unnamed: 2_level_2
CAN,35702910.0,15750.811633
USA,320739000.0,13245.881928
JPN,127833000.0,8099.598695
FRA,66548270.0,7367.843768
DEU,81686610.0,7281.272174
GBR,65116220.0,5471.933475
ZAF,55386370.0,4566.323754
CHN,1379860000.0,3905.317598
EGY,92442550.0,1685.818794
IND,1310152000.0,804.516349


**Per capita electricity usage is very high in Canada and the US. The table shows that population doesn't have any impact on the energy consumption.** 

In [27]:
# GDP vs Sustainable energy: using 2 dataframes

df_comparison_2 = dataframe_1.loc[:, ('GDP', 'Gross (USD)')]

comparison_2 = pd.DataFrame(df_comparison_2).join(pd.DataFrame(dataframe_2.loc[:, 
                                                            [('Electricity production source (% of total)','Solar & Wind'), 
                                                            ('Electricity production source (% of total)','Nuclear'), 
                                                            ('Electricity production source (% of total)','Hydro')]]))

comparison_2

Unnamed: 0_level_0,Unnamed: 1_level_0,GDP,Electricity production source (% of total),Electricity production source (% of total),Electricity production source (% of total)
Unnamed: 0_level_1,Unnamed: 1_level_1,Gross (USD),Solar & Wind,Nuclear,Hydro
Country,Year,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
CAN,2011,1793327000000.0,3.298016,14.707805,59.040234
CAN,2012,1828366000000.0,3.507259,14.900157,59.723302
CAN,2013,1846597000000.0,4.409954,15.549008,58.888079
CAN,2014,1805750000000.0,5.570376,16.119075,57.254617
CAN,2015,1556509000000.0,6.267257,15.546561,56.744193
CHN,2011,7551500000000.0,2.13764,1.835336,14.62413
CHN,2012,8532230000000.0,2.657515,1.953846,17.308734
CHN,2013,9570406000000.0,3.564878,2.053005,16.731349
CHN,2014,10475680000000.0,4.05666,2.339286,18.552494
CHN,2015,11061550000000.0,4.857004,,19.069813


In [28]:
#Using transform() method

#normalised the column by dvinding the max value for each category

comparison_2.groupby('Country').transform(lambda x: x/x.max())


Unnamed: 0_level_0,Unnamed: 1_level_0,GDP,Electricity production source (% of total),Electricity production source (% of total),Electricity production source (% of total)
Unnamed: 0_level_1,Unnamed: 1_level_1,Gross (USD),Solar & Wind,Nuclear,Hydro
Country,Year,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
CAN,2011,0.971152,0.52623,0.912447,0.988563
CAN,2012,0.990127,0.559616,0.92438,1.0
CAN,2013,1.0,0.70365,0.964634,0.986015
CAN,2014,0.97788,0.888806,1.0,0.958665
CAN,2015,0.842906,1.0,0.964482,0.950118
CHN,2011,0.68268,0.440115,0.784571,0.766873
CHN,2012,0.771341,0.547151,0.835232,0.907651
CHN,2013,0.865196,0.733966,0.87762,0.877374
CHN,2014,0.947035,0.835219,1.0,0.972872
CHN,2015,1.0,1.0,,1.0


**The normalised dataframe above shows that use of Solar and Wind energy had been gradually increasing in all countries except Egypt.**
**No data on Nigeria is available for this category.**

In [29]:
comparison_2.groupby('Country')[[('Electricity production source (% of total)', 'Solar & Wind')]].mean().round(3)

Unnamed: 0_level_0,Electricity production source (% of total)
Unnamed: 0_level_1,Solar & Wind
Country,Unnamed: 1_level_2
CAN,4.611
CHN,3.455
DEU,21.335
EGY,0.96
FRA,4.767
GBR,14.405
IND,4.819
JPN,5.332
NGA,0.0
USA,6.177


**Germany is leading in harnessing solar and wind energy followed by the United Kingdom.**

---