Data Source: https://www.kaggle.com/worldbank/world-development-indicators <br> Folder: 'world-development-indicators'

# Pull notebook directory from GitHub remote repository to Google Colaboratory (run the following code cell one time only)

In [3]:
import os

repo_url = "http://github.com/Data-Science-and-Data-Analytics-Courses/UCSanDiegoX---Python-for-Data-Science-03-Jan-2019-audit-".strip("/")
repo_to_notebook_dir = "Week 05 Data Visualization/Visualization"

# Create local repository
repo_name = os.path.basename(repo_url)
!git init "$repo_name"
%cd "$repo_name"

# Set up git for pull
!git remote add origin "$repo_url" # remote repository
!git config core.sparsecheckout true 
!echo "$repo_to_notebook_dir" >> .git/info/sparse-checkout # notebook directory to download

# Pull notebook directory to local repository
!git pull --all
!git branch -r
!git checkout master
%cd "$repo_to_notebook_dir"

/content/UCSanDiegoX---Python-for-Data-Science-03-Jan-2019-audit-/Week 05 Data Visualization/Visualization


# Step 0: Download the Dataset from Kaggle (using Kaggle API)
To use the Kaggle API, sign up for a Kaggle account at https://www.kaggle.com. Then go to the 'Account' tab of your user profile and select 'Create API Token'. This will trigger the download of kaggle.json, a file containing your API credentials.

For more information about using Kaggle API, please visit https://github.com/Kaggle/kaggle-api.

In [4]:
# Run this cell and select the downloaded kaggle.json file

from google.colab import files
import zipfile

# Select kaggle.json file to upload to current working directory
files.upload()
!ls -lha kaggle.json

# Place kaggle.json in directory ~/.kaggle
!mkdir ~/.kaggle
!mv kaggle.json ~/.kaggle/

# Secure API credentials
!chmod 600 ~/.kaggle/kaggle.json

# Download Dataset
!kaggle datasets download -d worldbank/world-development-indicators
!mkdir 'world-development-indicators' # data files wiil be extracted into this directory
data_filenames = ['Indicators.csv']
with zipfile.ZipFile('world-development-indicators.zip') as z:
  for name in z.namelist(): # zipped files already flattened (no top directory) 
    if name in data_filenames:
      z.extract(name, 'world-development-indicators')
!ls -sh 'world-development-indicators' # list extracted data files
!rm 'world-development-indicators.zip' # remove zip file

Saving kaggle.json to kaggle.json
-rw-r--r-- 1 root root 63 Jan  8 12:42 kaggle.json
Downloading world-development-indicators.zip to /content/UCSanDiegoX---Python-for-Data-Science-03-Jan-2019-audit-/Week 05 Data Visualization/Visualization
 98% 377M/385M [00:04<00:00, 57.2MB/s]
100% 385M/385M [00:04<00:00, 86.3MB/s]
total 548M
548M Indicators.csv


# Using Folium Library for Geographic Overlays

### Further exploring CO2 Emissions per capita in the World Development Indicators Dataset


In [0]:
import folium
import pandas as pd

### Country coordinates for plotting

source: https://github.com/python-visualization/folium/blob/master/examples/data/world-countries.json
Download the raw form: https://raw.githubusercontent.com/python-visualization/folium/588670cf1e9518f159b0eee02f75185301327342/examples/data/world-countries.json

In [0]:
country_geo = 'geo/world-countries.json'

In [0]:
# Read in the World Development Indicators Database
data = pd.read_csv('world-development-indicators/Indicators.csv')
data.shape

In [0]:
data.head()

Pull out CO2 emisions for every country in 2011

In [0]:
# select CO2 emissions for all countries in 2011
hist_indicator = 'CO2 emissions \(metric'
hist_year = 2011

mask1 = data['IndicatorName'].str.contains(hist_indicator) 
mask2 = data['Year'].isin([hist_year])

# apply our mask
stage = data[mask1 & mask2]
stage.head()

### Setup our data for plotting.  

Create a data frame with just the country codes and the values we want plotted.

In [0]:
plot_data = stage[['CountryCode','Value']]
plot_data.head()

In [0]:
# label for the legend
hist_indicator = stage.iloc[0]['IndicatorName']

## Visualize CO2 emissions per capita using Folium

Folium provides interactive maps with the ability to create sophisticated overlays for data visualization

In [0]:
# Setup a folium map at a high-level zoom @Alok - what is the 100,0, doesn't seem like lat long
map = folium.Map(location=[100, 0], zoom_start=1.5)

In [0]:
# choropleth maps bind Pandas Data Frames and json geometries.  This allows us to quickly visualize data combinations
map.choropleth(geo_data=country_geo, data=plot_data,
             columns=['CountryCode', 'Value'],
             key_on='feature.id',
             fill_color='YlGnBu', fill_opacity=0.7, line_opacity=0.2,
             legend_name=hist_indicator)

In [0]:
# Create Folium plot
map.save('plot_data.html')

In [0]:
# Import the Folium interactive html file
from IPython.display import HTML
HTML('<iframe src=plot_data.html width=700 height=450></iframe>')

More Folium Examples can be found at:<br>
http://python-visualization.github.io/folium/docs-v0.5.0/quickstart.html#Getting-Started <br>

Documentation at:<br>
http://python-visualization.github.io/folium/docs-v0.5.0/modules.html