<a href="https://colab.research.google.com/github/vivacitylabs/data-toolkit/blob/master/classified_counts_bulk_download_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classified Counts - Bulk Download Generator



## Generate a csv file of Classified Counts data over multiple days

This notebook is a tool to access VivaCity data via the API. It is aimed as an **interim solution** while we're working on new dashboard developments. You can contact your customer success manager for more information.

Use this notebook only if you need one of the T14 classes that can't be downloaded via the dashboard.

#### How it works

This notebook will run you through all the necessary steps and will save the output csv file in your Google Drive.

You will simply need to fill in a few details and then hit the run button next to the code cells.

**What you will need**

- Google account
- VivaCity API login credentials
- Countline ids you want to download data for


#### Output format

The output file will look something like this.

| Local Datetime | CountlineId | countlineName | direction |  car |	cyclist | motorbike | taxi | 
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|	2022-10-06 00:00:00 |21876	| RBK3_ClarenceSt_S3 | in |	8 | 3 | 1	| 0 |

## Stage 1: Getting Started
Let's begin by importing the packages we'll need and creating some useful functions!

Hit the run button (▶) in the top left corner.

In [None]:
import requests
import getpass
import json
import pandas as pd
from datetime import date, datetime, timedelta
import csv
import time
import pytz
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))
from ipywidgets import interact, interactive, fixed, interact_manual, Layout, Box
import ipywidgets as widgets
from google.colab import drive
drive.mount('/content/drive')

def get_date_range(start_date, end_date):
    start_dates = []
    end_dates = []

    start_date = datetime.fromisoformat(start_date)
    end_date = datetime.fromisoformat(end_date)
    while True:
        start_dates.append(start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        end_dates.append((start_date+timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        start_date = start_date+timedelta(days=1)
        if start_date > end_date:
            break
    date_range = list(zip(start_dates, end_dates))
    return date_range

## Stage 2: Data Import

First, we'll input the api username and password. Contact your Customer Success Manage if you don't have these details. 

We will then request all countlines the user has access to. If some countlines are missing, get in touch with us. 

Finally, you will select the date period, countlines and classes to request data for. 

### Authentication
Now you will need your API login details, ie. a username and a password. If you don't have one, please contact your Customer Success Manager.

1.   Enter the username into the field on the right, then hit the run button (▶).
2.   Input the password in the box that appears below it and hit "enter" on your keyboard. 


In [None]:
#@title  { run: "auto", vertical-output: true, display-mode: "form" }
#@markdown Insert your login credentials
username = "API-username" #@param {type:"string"}

auth_body = {}
auth_body['username'] = username
auth_body['password'] = getpass.getpass()

#### Available Countlines

Get access token using our username and password and get all countlines user has access to.

In [None]:
print("Authorising...")
auth_response = requests.post("https://api.vivacitylabs.com/get-token", data=auth_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
headers = {}
headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
refresh_body = {}
refresh_body['refresh_token'] = auth_response.json()['refresh_token']
start = time.time()
print("Done.")

#get countline meta data
api_url_base = 'https://api.vivacitylabs.com'
countline_request = requests.get(f'{api_url_base}/countline', headers=headers)
countlines = countline_request.json()

#convert to dataframe
df_countlines = pd.DataFrame.from_dict(countlines).rename(columns={"id": "countline_id", "name":"countline_name", "location":"countline_location", "direction":"countline_direction"})
print(len(df_countlines["countline_id"].unique()), " countlines available")

### Select countlines and date range for querying the API
Run this cell!

Then select class and countlines from the dropdown. Also select the start and end dates. Ensure that the start date is before the end date.  

In [None]:
box_layout = Layout(display='flex', flex_flow='column',
                    align_items='stretch', border=None, width='28%')

start_date_input = widgets.DatePicker(description="Start date",layout=Layout(width='55%'))
end_date_input = widgets.DatePicker(description="End date",layout=Layout(width='55%'))
timezone = widgets.Dropdown(options=['Europe/London', "Europe/Berlin", "Australia/Sydney"],description="Timezone",layout=Layout(width='55%'))

class_input = widgets.SelectMultiple(
    options=[ "cyclist", "motorbike", "car", "pedestrian", "taxi", "van", "minibus", "bus", "rigid", "truck", "emergency_car", "emergency_van", "fire_engine", "escooter"],
    description='Class',  disabled=False,
    layout=Layout(width='55%', height='230px')
)
countlines_input = widgets.SelectMultiple(
    options=df_countlines["countline_name"].unique(),
    description='Countlines',
    disabled=False,
    layout=Layout(width='auto', height='200px')
)
items = [start_date_input, end_date_input, timezone, class_input,countlines_input]
box = Box(children=items, layout=box_layout)
printmd("**Select date period and countlines**")
printmd("Hold  `Ctrl + Shift`  to select multiple classes or countlines")
print("")
box

Run the cell below to set the input parameters for the API request. Check that they look alright.

In [None]:
params = {}
params['countline'] = df_countlines[df_countlines["countline_name"].isin(countlines_input.value)]["countline_id"].to_list()
params['class'] = list(class_input.value)
params["includeZeroCounts"] = True

#convert local datetime to UTC datetime
start_date_utc = str(pd.to_datetime(start_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))
end_date_utc = str(pd.to_datetime(end_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))

#check if dates are in correct order
if start_date_input.value > end_date_input.value:
  print("Start date is after end date, please correct your date selection")
else:
  date_range = get_date_range(start_date_utc, end_date_utc)
  printmd("**Check your selection:**\n")
  print("Dates:", start_date_input.value, "to", end_date_input.value, "\nClass:", class_input.value, "\nCountlines:", countlines_input.value )

### Getting the data

We now query Classified Counts data from the API. 

The output will tell you how many requests are made and what the progress is.

In [None]:
data = []
for i,date in enumerate(date_range):
  time_elapsed = (time.time() - start)
  if time_elapsed > 500:
    print("Reauthorising...")
    auth_response = requests.post("https://api.vivacitylabs.com/refresh-token", data=refresh_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
    headers = {}
    headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
    refresh_body = {}
    refresh_body['refresh_token'] = auth_response.json()['refresh_token']
    start = time.time()
    print("Done.")
  params["timeFrom"] = date[0]  
  params["timeTo"] = date[1] 
  response = requests.get('https://api.vivacitylabs.com/counts', params=params, headers=headers)
  print(str(i+1) + "/" + str(len(date_range)) + ": " + str(response.status_code) + " " + response.reason)
  if response.status_code is 200:
    json_counts = response.json()
    df_request = {"countline_id" : [], "timefrom": [], "timeto": [], "classes": [], "in": [], "out":[]}   
    for item in json_counts:
        for bucket in json_counts[item]:
            for i in range(len(json_counts[item][bucket]["counts"])):
                df_request["countline_id"].append(item)
                df_request["timefrom"].append(json_counts[item][bucket]["from"])
                df_request["timeto"].append(json_counts[item][bucket]["to"])
                df_request["classes"].append(json_counts[item][bucket]["counts"][i]['class'])
                df_request["in"].append(json_counts[item][bucket]["counts"][i]['countIn'])
                df_request["out"].append(json_counts[item][bucket]["counts"][i]['countOut'])
    
    #convert to dataframe
    df_request = pd.DataFrame.from_dict(df_request)
    data.append(df_request)
  else:
    print("Data missing for " + params["timeFrom"].split("T")[0] + " to " + params["timeTo"].split("T")[0])
  time.sleep(1)

data = pd.concat(data, axis=0)  

## Stage 3: Data Processing
Now we process the raw data output and put it into a nice format. Here you can select a desired time bucket.


In [None]:
#@title  {vertical-output: true, display-mode: "form" }
#@markdown Select time bucket and run data processing
timebucket = "1H" #@param ['15Min', "1H", "24H"]

export = data.rename(columns={"countline_id": "CountlineId", "timefrom":"UTC Datetime"}).drop(columns="timeto")
export["Local Datetime"] = pd.to_datetime([pd.to_datetime(x).tz_convert(timezone.value).strftime('%Y-%m-%d %H:%M:%S') for x in export["UTC Datetime"]])
export = export.drop(columns="UTC Datetime")

#add countline names
export["countlineName"] = export["CountlineId"].map(dict(zip(df_countlines["countline_id"], df_countlines["countline_name"])))

#reshape
export = export.groupby(['Local Datetime', 'CountlineId',  'countlineName','classes']).sum().stack().to_frame().reset_index().rename(columns={"level_4":"direction", 0:"counts"})
export = export.pivot(index=[ 'Local Datetime', 'CountlineId', 'countlineName','direction'], columns='classes', values='counts').reset_index()

#aggregate into desired time buckets
export = export.groupby([pd.Grouper(key='Local Datetime', freq=timebucket), 'CountlineId', 'countlineName','direction']).sum().reset_index()

## Stage 4: Data Export
Now let's write this to a .csv and save it to the parent folder (`My drive`) in Google Drive. 

Set the `filename` to the right and then hit run.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown Set the output filename
filename = "" #@param {type:"string"}

path = '/content/drive/My Drive/'
export.to_csv(path + filename +".csv", index = False)