# Working Example of a Data Loading Notebook
For many indicators, we will not be able to pull dynamic data from a data API. In these cases, we will need to manually load the data into the database. This notebook provides a working example of how to do this.

## Step 0: Connecting to Live Databases

We will need to be able to work with the live running instances of our databases.  Right now, we're using MongoDB to store our semistructured data. There are two flavors of live databases: (1) the databases  which require authentication to access.  We protect these databases so that 

In [1]:
import pandas as pd
import numpy as np
from sys import path
path.append('../')

from database_connector.SSPIDatabaseConnector import SSPIDatabaseConnector
database = SSPIDatabaseConnector()


# 1 - Import the Data

Our goal with moving the SSPI Data Pipeline to a code-only framework is to guarantee the reproducibility of our results, and all of that starts with how we work with our raw data.  The **best practice** for working with raw data is to keep a completely unaltered version of what you've downloaded and to work directly from that.  Occasionally, we will run into situations where this will not be feasible, but in almost all cases we will be working from the originals. We will be saving all of our our raw data downloads in the SSPI Google Drive, which we can link to directly from here.  I'll be working on implementing a direct connection to our Google Drive through their API down the line, but that's not strictly necessary to get started.  For now, link to the raw file you've downloaded in your data loading notebook.

For example, the data we're using for the EPI come from 


In [2]:
test_data = [{
    "CountryCode": "AFG",
    "IndicatorCode": "BIODIV",
    "Score": 0.33,
    "Unit": "Aggregate",
    "Value": 0.33,
    "Year": 2004
  },
    {
    "CountryCode": "AFG",
    "IndicatorCode": "BIODIV",
    "Score": 0.34,
    "Unit": "Aggregate",
    "Value": 0.34,
    "Year": 2005,
  }]

test_data = pd.DataFrame(test_data)
test_data


Unnamed: 0,CountryCode,IndicatorCode,Score,Unit,Value,Year
0,AFG,BIODIV,0.33,Aggregate,0.33,2004
1,AFG,BIODIV,0.34,Aggregate,0.34,2005


In [5]:
database.load_data_local(test_data, "TESTNG")

[{"CountryCode":"AFG","IndicatorCode":"BIODIV","Score":0.33,"Unit":"Aggregate","Value":0.33,"Year":2004},{"CountryCode":"AFG","IndicatorCode":"BIODIV","Score":0.34,"Unit":"Aggregate","Value":0.34,"Year":2005}]


<Response [403]>

In [6]:
database.load_data_remote(test_data, "TESTNG")

[{"CountryCode":"AFG","IndicatorCode":"BIODIV","Score":0.33,"Unit":"Aggregate","Value":0.33,"Year":2004},{"CountryCode":"AFG","IndicatorCode":"BIODIV","Score":0.34,"Unit":"Aggregate","Value":0.34,"Year":2005}]


ValueError: Cannot set verify_mode to CERT_NONE when check_hostname is enabled.