<a href="https://colab.research.google.com/github/wu21aad/assignment/blob/main/7PAM2000.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

IMPORTING USEFUL LIBRARIES

In [None]:
import os 
import zipfile
import pandas as pd
import numpy as np
from tensorflow import keras # for downloading the data

DATA FETCHING

Fetch the data from the world bank archive on cllimate change. kera's function called utils is used to fetch the data, zipfile's function called ZipFile is used to extract the world bank data.

In [None]:
url = "https://api.worldbank.org/v2/en/topic/19?downloadformat=csv" # data repository

# a function for downloading and etxracting data
def download_extract_data():
  download_path = os.path.join(os.getcwd(),"data/data.tar.gz") # path for storing the data's zip file
  # fetching the data
  if not os.path.exists("data"):
    os.makedirs("data")
    keras.utils.get_file(download_path,url)
  # Extracting the data
  if not os.path.exists("data/data.csv"):
    basepath = os.path.join(os.getcwd(),"data")
    with zipfile.ZipFile(download_path, "r") as zp:
      zp.extractall(basepath)


download_extract_data()
    

CATEGORIZING THE DATASET

The already extracted csv file is preprocessed for easier manipulation. Pandas, a library for manipulating structured data such as csv and excel files is used to clean the data. The first four rows are skipped because they contain no structured data and can therefore create compile-time error. We define a function to return two dataframes, one with years as columns and the other with countries as columns

In [None]:
# a function which converts the original dataframes into two dataframes with years
# and country names as columns respectively
dataframe = pd.read_csv("data/API_19_DS2_en_csv_v2_3931355.csv", skiprows=4)
def categorize_df(csv_path):
  original_df = pd.read_csv(csv_path, skiprows=4)
  df_by_year = original_df.iloc[:, 4:] # dataframe with years as columns
  df_by_country = original_df.iloc[:,0] # dataframe with countries as columns
  return df_by_year, df_by_country

df_year, df_country = categorize_df("data/API_19_DS2_en_csv_v2_3931355.csv") # 2 dataframes with country and years as columns respectively
df_country = df_country.drop_duplicates() # dropping duplicates for faster manipulation

In [None]:
dataframe[dataframe['Country Name'] == "World"].head() # A dataframe on the values of the world

In [None]:
df_year.head(10) # visualizing the top 10 rows on dataframe with years i.e 1996,1998,2000 as columns

In [None]:
df_country.tail(10) # Visualizing the last 10 rows on dataframe with country Name as the column

STATISTICAL ANALYSIS OF INDICATORS

Here we analyze a pair of related indicators, for example % of land under agriculture and % of land under forest on each country, between two countries and between a country and the world. We compare and contrast the two indicators to see how they are related with each other

In [None]:
indicator_codes = [
                   "AG.LND.FRST.ZS","AG.LND.AGRI.ZS"
                   ] # Land covered by forest, Arable land
countries = df_country.to_numpy()

# A function which takes Country Names and two indicator codes... 
# it returns a dictionary containing each countries time series data on the two indicators

def get_indicators_data(
        nations, 
        indicators_codes):
  nations_indicators_data = {}
  for nation in nations:
    indicators = {}
    for code in indicators_codes:
      df = dataframe[dataframe['Country Name'] == nation]
      df = df[df['Indicator Code'] == code].iloc[:,4:].to_numpy()[0][:60]
      indicators[code] = df
    nations_indicators_data[nation] = indicators
  return nations_indicators_data

c_indicators = get_indicators_data(
    countries,
    ["AG.LND.FRST.ZS","AG.LND.AGRI.ZS"])

In [None]:
# world indicator statistics
world_indicator_ds = {}
for code in indicator_codes:
  w_ds = dataframe[dataframe['Country Name'] == 'World']
  w_ds = w_ds[w_ds['Indicator Code'] == code].iloc[:,4:].to_numpy()[0][:60]
  world_indicator_ds[code] = w_ds
world_indicator_ds

VISUALIZATION

1. WORLD STATISTICS

A plot on the two indicators with 'World' as the Country Name. We use Matplotlib for doing visualization

In [None]:
from matplotlib import pyplot as plt
import matplotlib as mpl

indexes = np.arange(1960,2020,1)
# a function for plotting the two indicators of a country
def plot(
        y1, 
        y2, 
        title):
    fig, ax = plt.subplots(figsize=(3, 3))
    ax.set_title(title, color='C0')
    ax.plot(indexes, y1, 'C1', label="LAND ON AGRIC")
    ax.plot(indexes, y2, 'C2', label="LAND ON FOREST")
    ax.legend()
plot(
    world_indicator_ds['AG.LND.AGRI.ZS'], 
    world_indicator_ds['AG.LND.FRST.ZS'], 
    "World Statistics")

ALL THE CODES HAVE BEEN WRITTEN ACCORDING TO THE  PEP-8 GUIDELINES

https://peps.python.org/pep-0008/

THE CODE IS WELL STRUCTURED AND EASY TO MAINTAIN BECAUSE IT FOLLOWS PEP-8 GUIDELINE. IT'S THEREFORE NO SOME SPAGHETI CODE

2. DESIRED COUNTRY STATISTICS

A visualization of the two indicators with 'Kenya' as the Country Name

In [None]:
country_indicator = c_indicators['Kenya']
Agric_Indicator = country_indicator['AG.LND.AGRI.ZS']
Forest_Indicator = country_indicator['AG.LND.FRST.ZS']
plot(Agric_Indicator,Forest_Indicator,"Kenya Statistics")
 

COVARIANCE BETWEEN THE % OF LAND UNDER AGRICULTURE AND % OF LAND UNDER FOREST

Results here show that there is both positive and negative covarince between the two indicators. This shows that during some period, an increase in the % of land under forest will lead to an increase in % of land under agriculture i.e same direction of change. During some period however, a decrease in % of land under forest result to an increase in % of land under agriculture indicating that some forest areas are being cleared to provide more space for agricultural activity

In [None]:
# we slice before getting the covariance because of missing data
cov = np.cov(Agric_Indicator[40:59], Forest_Indicator[40:59])
cov

COMPARISON BETWEEN THE WORLD AND A DESIRED COUNTRY

A visualization of each indicator between 'World' and 'Kenya'. We even go further in the next section by finding the covariance i.e how the two indicators vary with each other

In [None]:
# A function for plotting an indicator between two countries
def plot2(
        y1, 
        y2, 
        title, 
        label_1, 
        label_2):
    fig, ax = plt.subplots(figsize=(3, 3))
    ax.set_title(title, color='C0')
    ax.plot(indexes, y1, 'C1', label=label_1)
    ax.plot(indexes, y2, 'C2', label=label_2)
    ax.legend()

In [None]:
# Land under Forest
plot2(
    world_indicator_ds['AG.LND.AGRI.ZS'], 
    country_indicator['AG.LND.AGRI.ZS'], 
    "Land Under Agriculture", 
    "World", 
    "Kenya")

In [None]:
# Land under Agriculture
plot2(
    world_indicator_ds['AG.LND.FRST.ZS'], 
    country_indicator['AG.LND.FRST.ZS'], 
    "Land Under Forest", 
    "World", 
    "Kenya")

In [None]:
# Again we slice because of missing data
c_w_cov = np.cov(
    world_indicator_ds['AG.LND.FRST.ZS'][40:50], 
    country_indicator['AG.LND.FRST.ZS'][40:50]) # a country vs world covariance on land under forest
c_w_cov

COMPARISON BETWEEN TWO COUNTRIES

Here we also find the covariance on an indicator between two countries. Array slicing is done because some columns have 'nan' as values

In [None]:
first_country = 'Kenya'
second_country = 'Pakistan'
fc_indicator = c_indicators[first_country] # first country's indicator
sc_indicator = c_indicators[second_country] # second country's indicator

In [None]:
plot2(
    fc_indicator['AG.LND.FRST.ZS'], 
    sc_indicator['AG.LND.FRST.ZS'], 
    "Land under Forest", 
    first_country, 
    second_country)
# again we slice due to missing data
c_c_cov = np.cov(fc_indicator['AG.LND.FRST.ZS'][40:50], sc_indicator['AG.LND.FRST.ZS'][40:50]) # country vs country covariance on land under forest
c_c_cov

THE STORY
1. ARABLE LAND VS FOREST AREA

From the year 1990 t0 2010 forest area in Germany has been increasing showing that deforestration rate is decreasing. Arable land has a intermittent change. As for Kenya, arable land has been increasing in the past 10 years while area under forest has been decreasing showing a deforestration trend

In [None]:
states = [
          'Kenya','Pakistan','World','Germany'
          ]
arable_forest_indicators = [
                            'AG.LND.ARBL.ZS','AG.LND.FRST.ZS'
                            ]

arable_forest_data = get_indicators_data(
    states, 
    arable_forest_indicators)

def plot3(
        y1, 
        y2, 
        title, 
        label_1, 
        label_2):
    fig, ax = plt.subplots(figsize=(3, 3))
    ax.set_title(title, color='C0')
    ax.plot(indexes, y1, 'C1', label=label_1)
    ax.plot(indexes, y2, 'C2', label=label_2)
    ax.legend()

plot3(
    arable_forest_data['Germany']['AG.LND.ARBL.ZS'], 
    arable_forest_data['Germany']['AG.LND.FRST.ZS'], 
    "Germany", 
    "Arable Land", 
    "Forest Land")

plot3(
    arable_forest_data['Kenya']['AG.LND.ARBL.ZS'], 
    arable_forest_data['Kenya']['AG.LND.FRST.ZS'], 
    "Kenya", 
    "Arable Land", 
    "Forest Land")

2. ELECTRIC POWER CONSUMPTION VS ACCESS TO ELECTRICITY VS OVERALL ENERGY 

Electricity in Kenya is progressively becoming more accessible from the year 2010 to 2020. Electic Power consumption has therefore increased in Kenya. As for Germany, the power consumption has been rising over the last ten years but ease of access to electricity has remained fairly constant .

In [None]:
electricity_indicator = [
                         'EG.ELC.ACCS.ZS', 'EG.USE.ELEC.KH.PC'
                         ]

electricity_data = get_indicators_data(
    states, 
    electricity_indicator)

plot3(
    electricity_data['Kenya']['EG.ELC.ACCS.ZS'], 
    electricity_data['Kenya']['EG.USE.ELEC.KH.PC'], 
    "Kenya", 
    "Electricity Access", 
    "Power Consumption")

plot3(
    electricity_data['Germany']['EG.ELC.ACCS.ZS'], 
    electricity_data['Germany']['EG.USE.ELEC.KH.PC'], 
    "Germany", 
    "Electricity Access", 
    "Power Consumption")


3.METHANE PRODUCTION VS POVERTY HEAD COUNT

The rate of emission of methane is decreasing in Germany while Povert head Count has remained relatively constant. This shows that the emission of menthane is having little impact on poverty head count in Germany

In [None]:
methane_pov_hc_indicator = [
                            'EN.ATM.METH.ZG',
                            'SI.POV.DDAY'
                            ]
methane_pov_hc_data = get_indicators_data(
    states,
    methane_pov_hc_indicator)

plot3(
    methane_pov_hc_data['Germany']['EN.ATM.METH.ZG'], 
    methane_pov_hc_data['Germany']['SI.POV.DDAY']*50, 
    "Germany", 
    "Methane Emission", 
    "Poverty Head Count"
    )