## Transfer Clean Data

Transfer datasets to be used on TJI's Tableau Public Dashboards
 - Pre-trial deaths in custody
 
 ### Datasets used

* Input:
  * `tji/deaths-in-custody/cleaned_custodial_death_reports.csv`

* Output:
  * Google Sheet with query results
  
##### Author: James Babyak (james.babyak@gmail.com)

## Steps
**1. Setup**
- 1a. Configuration and imports
    - Libraries
    
**2. Download data from data.world**
- 2a. Query data based on saved

**3. Save to location**
- 3a. Google Drive

---

## 1. Imports

In [1]:
# Import ALL the things

import os
import sys
import json
import boto3
import datetime
import numpy as np
import pandas as pd
import datadotworld as dw
import pygsheets

from io import StringIO
from lib.cleaning_tools import *

sys.path.append(os.getcwd() + '/../data_cleaning')

pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)

%load_ext watermark
%watermark -a "James Babyak" -d -t -z -w -p numpy,pandas,datadotworld

James Babyak 2020-06-28 12:13:40 CDT 

numpy 1.15.4
pandas 0.23.4
datadotworld 1.7.0
watermark 2.0.2


## 2. Download Data

In [2]:
# Query used for pre-trial deaths
query = "\
SELECT * \
FROM cleaned_custodial_death_reports \
WHERE (type_of_custody LIKE 'JAIL%' OR type_of_custody = 'PRIVATE FACILITY') \
AND NOT were_the_charges = 'CONVICTED';"

In [3]:
# Get dataset of cleaned custoidal deaths from data.world and make into DF
dataset = dw.load_dataset('tji/deaths-in-custody')
cdr = dataset.dataframes['cleaned_custodial_death_reports']

  'force_update=True'.format(dataset_key))


In [4]:
# Filter datafram for pre-custodial deaths
boolean = (cdr['were_the_charges'] != 'CONVICTED') & \
          (cdr['type_of_custody'].str.contains('JAIL') | cdr['type_of_custody'].str.contains('PRIVATE'))
df_custodial_deaths = cdr.loc[boolean]

## 3. Save to Location

In [5]:
#authorization
gc = pygsheets.authorize(service_account_file='../automation/client_secret_store.json')

In [6]:
gc.spreadsheet_titles()

['tableau-dashboard-pre-conviction-deaths-in-texas-QueryResult']

In [7]:
#open the google spreadsheet (where 'PY to Gsheet Test' is the name of my sheet)
sh = gc.open('tableau-dashboard-pre-conviction-deaths-in-texas-QueryResult')

In [8]:
#select the first sheet 
wks = sh[0]

In [9]:
#update the first sheet with df, starting at cell B2. 
wks.set_dataframe(df_custodial_deaths,(1,1))