This script provides functionality for automatically downloading the contents of a Kobo toolbox account. 

This technique is applicable for safeguarding and recording GEMS data. 

CREDITS: Bernhard Metz for setting up the GOST user account. 

Import some libraries (some may be redundant...I wish you luck figuring out which are!)

In [1]:
import urllib3
import pandas as pd
import os, sys, time
import json
import re, requests, json, sys, os, io, configparser, logging, time
import subprocess
import ast

Basic settings. Contact @BenStewart or @BernhardMetz for the username, password and token info for the GOST GEMS account if you require it.

In [86]:
username = ''
password = ''
token = r''

We can test this data immediately by sending a single request to the API with the username and password:

In [76]:
target = r'data'
 
url = 'https://kc.kobotoolbox.org/api/v1/data'
headers = {"Authorization": "Token %s" % token, "Content-Type": "application/x-www-form-urlencoded"}
params = {"grant_type": "password", "username": username, "password": password }
res = requests.get(url, headers=headers, data=params)
a = res.json()

The returned JSON object, a, describes the projects that this username and password has access to:

In [65]:
a

[{'id': 132599,
  'id_string': 'a2NWYnE3DStCiTXDEvp7eL',
  'title': 'Clone of HRRP-Quantified Supervisory Checklist Final',
  'description': 'Clone of HRRP-Quantified Supervisory Checklist Final',
  'url': 'https://kc.kobotoolbox.org/api/v1/data/132599'},
 {'id': 171956,
  'id_string': 'abqA3JAHbG8C5fJ2B4HkPY',
  'title': 'PAPD v2 - EVALUATION DES BESOINS GEOLOCALISES DANS LES ZONES D’INTERVENTION',
  'description': 'PAPD v2 - EVALUATION DES BESOINS GEOLOCALISES DANS LES ZONES D’INTERVENTION',
  'url': 'https://kc.kobotoolbox.org/api/v1/data/171956'},
 {'id': 155719,
  'id_string': 'aginZWr8xjkPhDHFEp3DiX',
  'title': 'API Test',
  'description': 'API Test',
  'url': 'https://kc.kobotoolbox.org/api/v1/data/155719'},
 {'id': 252660,
  'id_string': 'aiRBmL3QZeFVJfBMbFNJQs',
  'title': 'TEST Experimental Question Types',
  'description': 'TEST Experimental Question Types',
  'url': 'https://kc.kobotoolbox.org/api/v1/data/252660'},
 {'id': 155763,
  'id_string': 'aYCum2c6QtC98YXRHCHQDN',
 

Each project has its own URL string. This provides a specific link that we can query to get access to the data.

Here we define a data-pulling function, pull, that, when passed a url, fetches all of the data as a JSON format and returns it to the caller:

In [15]:
def pull(url):
    headers = {"Authorization": "Token %s" % token, "Content-Type": "application/x-www-form-urlencoded"}
    params = {"grant_type": "password", "username": username, "password": password }
    res = requests.get(url, headers=headers, data=params)
    q = res.json()
    return q

We can now use this to pull the data associated with each project:

In [88]:
i = 0
downloaded_data = pull(a[i]['url'])
downloaded_data_df = pd.DataFrame(downloaded_data)

We can also print the first 5 rows of the response:

In [89]:
downloaded_data_df.head(5)

Unnamed: 0,ANC_1_Last_Quarter,ANC_4_Last_Quarter,Boma_Health_Initiati_e_Citizen_Engagement,Communication,County_If_Akobo,County_If_Boma_,County_If_Central_Upper_Nile,County_If_Fashoda_,County_If_Jonglei_,County_If_Northern_Upper_Nile,...,end,formhub/uuid,meta/deprecatedID,meta/instanceID,phonenumber,simserial,start,subscriberid,today,username
0,0,0,boma_health_te,internet_syste,,,,,duk_payuel,,...,2018-05-02T17:12:16.775+03,e7b18869fb704e08ada5c5659c658a8e,,uuid:e20bd30c-03f6-4ebc-862f-859fa6e9e3fd,916285677.0,8.921101009146798e+18,2018-05-02T17:01:28.405+03,659060014679875.0,2018-05-02,hrrp_ssd_collector
1,20,10,boma_health_te,telephone,,,,,twic_south,,...,2017-11-09T14:28:49.853+08,e7b18869fb704e08ada5c5659c658a8e,,uuid:24d610fd-aa06-4a59-8303-5a33d356bbe8,916258584.0,8.921101009146541e+18,2017-11-09T13:37:21.980+08,659060014654048.0,2017-11-09,hrrp_ssd_collector
2,15,19,boma_health_te,telephone,,,,,twic_central,,...,2017-11-17T11:11:33.208+08,e7b18869fb704e08ada5c5659c658a8e,,uuid:4233353b-707a-4961-a0ab-bc395902ba91,916258584.0,8.921101009146541e+18,2017-11-17T10:48:34.015+08,659060014654048.0,2017-11-17,hrrp_ssd_collector
3,153,6,boma_health_te,internet_syste,,,,,twic_north,,...,2017-11-02T16:34:41.045+08,e7b18869fb704e08ada5c5659c658a8e,,uuid:0c6ea310-d601-403f-9d7d-44232993f56f,,,2017-11-01T19:29:59.013+08,,2017-11-01,hrrp_ssd_collector
4,0,0,boma_health_te,internet_syste telephone radio,,,,,duk_payuel,,...,2018-05-18T12:23:47.169+03,e7b18869fb704e08ada5c5659c658a8e,,uuid:4862c88f-bfe9-45c4-9d2f-27e2f10521ed,916285677.0,8.921101009146798e+18,2018-05-18T12:04:10.102+03,659060014679875.0,2018-05-18,hrrp_ssd_collector


We could save these down automatically using a loop. 

Importantly, each of these DataFrames contains a column called 'filename', which links to images taken in the field by operatives (each row may or may not have a valid URL). These can also be saved as images in a separate folder.

In the below block, we iterate through a DataFrame ('dataframe') and download all associated .jpgs to the outputFolder location.

In [82]:
# modify this to send the images to a different location
outputFolder = r'C:\Users\charl\Documents\GitHub\GOST_PublicGoods\Implementations\Kobo Toolbox - API downloader\out'

In [72]:
# select which project to download from list a
i = 0

dataset = a[i]
data = pull(dataset['url'])
dataframe = pd.DataFrame(data)

links = {}
header = r'https://kc.kobotoolbox.org/attachment/original?media_file='

# put all links the in the dataframe into a dictionary
if len(dataframe) > 0:
    for y in range(0,len(dataframe.index)):
        a = dataframe['_attachments'].astype(str).loc[y]
        if a == '[]':
            pass
        else:
            try:
                a = ast.literal_eval(a)
            except:
                raise ValueError((a))
            if len(a) > 0:
                a = a[0]
                if type(a) == dict:
                    z = header + a['filename']
                    links.update({z.split('/')[-1][:-4]: z})

# iterate through the extracted dictionary, and download the image to the folder
for ID, imgUrl in links.items():
    imgData = requests.get(imgUrl, headers=headers)
    fileName = os.path.join(outputFolder,'%s_%s.jpeg' % (dataset['id'], ID))
    output = open(fileName,'wb')
    output.write(imgData.content)
    output.close()

KeyError: 0