<a href="https://colab.research.google.com/github/vivacitylabs/data-toolkit/blob/master/journey_times_bulk_download_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Journey Times - Bulk Download Generator



## Generate a csv file of Journey Times data over multiple days

This notebook is a tool to access VivaCity data via the API. It is aimed as an **interim solution** while we're working on new dashboard developments. You can contact your customer success manager for more information.

This notebook only works for sensors that have [Journey Times](https://vivacitylabs.customerly.help/vivacity-dashboard/journey-times) enabled.

#### How it works

This notebook will run you through all the necessary steps and will save a csv file in your Google Drive.

You will need to fill in a few details and then hit the run button (▶) next to the code cells.

If you want to make changes to the code and save them, you will first need to save a copy of this notebook to your Google Drive.

**What you will need**

- Google account
- VivaCity API login credentials
- Countline ids you want to download data for 

#### Output format

You will receive the median journey times between two sensors in 1 hour time buckets for your selected date period. The data is provided in **Local Datetime.**



| Start Date | Start time | End time | Origin sensor |  Destination sensor |	Median journey time in seconds |	Number of journeys|	
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
| 02/03/2022 |10:00:00	| 11:00:00|	RBK1 ClarenceSt | RBK2 WheatfieldWay | 1995.581055	|1	|

## Stage 1: Getting Started
Let's begin by importing the packages we'll need and creating some useful functions!

Hit the run button (▶) in the top left corner.

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to import functions and connect to Google Drive
import requests
import getpass
import json
import pandas as pd
from datetime import date, datetime, timedelta
import csv
import time
import pytz
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
from google.colab import drive
drive.mount('/content/drive')

def get_date_range(start_date, end_date):
    start_dates = []
    end_dates = []

    start_date = datetime.fromisoformat(start_date)
    end_date = datetime.fromisoformat(end_date)

    while True:
        start_dates.append(start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        end_dates.append((start_date+timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        start_date = start_date+timedelta(days=1)
        if start_date > end_date:
            break
    date_range = list(zip(start_dates, end_dates))
    return date_range

## Stage 2: Data Import

1. First, we authenticate the API user to get access to the data. If the user isn't setup properly, this will throw an error. Check you got the correct username and password.

2. Second, we will get all the sensors available to the API user. We will use this to retrieve the ``deviceuid`` for each one which is needed for getting Journey Times API data. 

3. Lastly, you need to select which sensors you want data for from a drop down. You then select a date range to query. This will result in multiple API responses which will be handled in the next step (Data Processing).

ℹ  Sensor names are retrieved from countline names so they can sligthly differ if not named consistently. 


### Authentication
Now you will need your API login details, ie. a username and a password. If you don't have one, please contact your Customer Success Manager.

1.   Enter the username into the field on the right, then hit the run button (▶).
2.   Input the password in the box that appears below it and hit "enter" on your keyboard. 

In [None]:
#@title  { run: "auto", vertical-output: true, display-mode: "form" }
#@markdown Insert your login credentials
username = "api-username" #@param {type:"string"}

auth_body = {}
auth_body['username'] = username
auth_body['password'] = getpass.getpass()

### Available Sensors

We'll now get an access token using the username and password set above and get all sensors the api user has access to.

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to get all available sensors
print("Authorising...")
auth_response = requests.post("https://api.vivacitylabs.com/get-token", data=auth_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
headers = {}
headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
refresh_body = {}
refresh_body['refresh_token'] = auth_response.json()['refresh_token']
start = time.time()
print("Done.")

#get sensor meta data
api_url_base = 'https://api.vivacitylabs.com'
sensor_request = requests.get(f'{api_url_base}/sensor', headers=headers)
sensors = sensor_request.json()

#get countline meta data
countline_request = requests.get(f'{api_url_base}/countline', headers=headers)
countlines = countline_request.json()

#convert to dataframe
df_sensors = pd.DataFrame.from_dict(sensors).explode('countlines').reset_index(drop=True).rename(columns={"id":"deviceuid", "location": "sensor_location", "countlines":"countline_id"})
df_countlines = pd.DataFrame.from_dict(countlines).rename(columns={"id": "countline_id", "name":"countline_name", "location":"countline_location", "direction":"countline_direction"})
df_meta = pd.merge(df_sensors, df_countlines[["countline_id", "countline_name"]], left_on=df_sensors["countline_id"], right_on=df_countlines["countline_id"], 
                      how="left").drop(columns=["key_0", "countline_id_y", "sensor_location", "availableClasses"]).rename(columns={"countline_id_x":"countline_id"})
df_meta = df_meta[~df_meta["countline_name"].isna()]
df_meta['sensor_name'] = df_meta['countline_name'].str.split('_').str[0] + " " + df_meta['countline_name'].str.split('_').str[1]
df_meta = df_meta.drop_duplicates(subset='deviceuid', keep='first').sort_values(by='sensor_name',ascending=True)

#get clean list of sensors for dropdown
sensor_dropdown = df_meta['sensor_name'].dropna().to_list()
print(len(sensor_dropdown), " sensors available")

### Select sensors and date range for querying journey times
Run this cell!

Then celect an origin sensor and destination sensor from the dropdown. Use the VivaCity Dashboard to check if the sensors have Journey Times enabled and are a useful pairing. 

Also select the start and end dates. Ensure that the start date is before the end date.  

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this and then make your selection

originSensor = widgets.Dropdown(options=sensor_dropdown)
destinationSensor = widgets.Dropdown(options=sensor_dropdown)
print("Select Origin Sensor:")
display(originSensor)
print(" ")
print("Select Destination Sensor:")
display(destinationSensor)
print(" ")

start_date_input = widgets.DatePicker()
end_date_input = widgets.DatePicker()

print("Select Start Date: ")
display(start_date_input)
print(" ")
print("Select End Date:")
display(end_date_input)
print(" ")
timezone = widgets.Dropdown(options=['Europe/London', "Europe/Berlin", "Australia/Sydney"])
print("Select timezone:")
display(timezone)

Run the cell below to set the input parameters for the API request. Check that they look alright.

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to set the API request parameters and check your selection again
params = {}
params['originSensor'] = df_meta[df_meta["sensor_name"]==originSensor.value]["deviceuid"].iloc[0]
params['destinationSensor'] = df_meta[df_meta["sensor_name"]==destinationSensor.value]["deviceuid"].iloc[0]
params['timeBucketSize'] = "60"

#convert local datetime to UTC datetime
start_date_utc = str(pd.to_datetime(start_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))
end_date_utc = str(pd.to_datetime(end_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))

#check if dates are in correct order
if start_date_input.value > end_date_input.value:
  print("Start date is after end date, please correct your date selection")
else:
  date_range = get_date_range(start_date_utc, end_date_utc)
  print("Check your selection:\n")
  print("Origin sensor:", originSensor.value,"\nDestination sensor:", destinationSensor.value, "\nDates:", start_date_input.value, "to", end_date_input.value)

### Getting the data

We now query Journey Times data from the API. 

The output will tell you how many requests are made and what the progress is.

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to request data from the API (can take a bit)
data = []
for i,date in enumerate(date_range):
  time_elapsed = (time.time() - start)
  if time_elapsed > 500:
    print("Reauthorising...")
    auth_response = requests.post("https://api.vivacitylabs.com/refresh-token", data=refresh_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
    headers = {}
    headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
    refresh_body = {}
    refresh_body['refresh_token'] = auth_response.json()['refresh_token']
    start = time.time()
    print("Done.")
  params["timeFrom"] = date[0]  
  params["timeTo"] = date[1] 
  response = requests.get("https://api.vivacitylabs.com/journey-times/arriving/bucketed", params=params, headers=headers)
  print(str(i+1) + "/" + str(len(date_range)) + ": " + str(response.status_code) + " " + response.reason)
  if response.status_code is 200:
    data.append(response.json())
  else:
    print("Data missing for " + params["timeFrom"].split("T")[0] + " to " + params["timeTo"].split("T")[0])
  time.sleep(1)

## Stage 3: Data Processing
Now we process the raw data output and put it into a nice format.


In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to process and clean the data

export = []
for date_data in data:
  day_export = []
  for time_bucket, time_data in date_data.items():
    start_time = datetime.strptime(time_bucket,'%Y-%m-%dT%H:%M:%S.000Z')
    end_time = datetime.strptime(time_bucket,'%Y-%m-%dT%H:%M:%S.000Z')+timedelta(minutes=60)
    row = [#convert dates back to local datetime from UTC
            str(pd.to_datetime(start_time).tz_localize('utc').astimezone(timezone.value).date()),
            str(pd.to_datetime(start_time).tz_localize('utc').astimezone(timezone.value).time()),
            str(pd.to_datetime(end_time).tz_localize('utc').astimezone(timezone.value).time()),
           originSensor.value,
           destinationSensor.value,
           int(time_data['medianJourneyTimeInSeconds']),
           time_data['numberOfJourneys']]
    day_export.append(row)
  export += day_export

## Stage 4: Data Export
Now let's write this to a .csv and save it to the parent folder (`My drive`) in Google Drive. 

Set the `filename` to the right and then hit run.

In [None]:
#@title  {vertical-output: true, display-mode: "form" }
#@markdown Set the output filename
filename = "test-outfile" #@param {type:"string"}

path = '/content/drive/My Drive/'
header = ['Date',	'Start time',	'End time', 'Origin sensor', 'Destination sensor', 'Median journey time in seconds', 'Number of journeys']
with open(path + filename + '.csv', 'w') as f:
  writer = csv.writer(f)
  writer.writerow(header)
  writer.writerows(export)