<a href="https://colab.research.google.com/github/vivacitylabs/data-toolkit/blob/master/notebooks/zonal_speeds_bulk_download_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Zonal Speeds - Bulk Download Generator



## Generate a csv file of countline speed data over multiple days

This notebook is a tool to access VivaCity data via the API. It is aimed as an **interim solution** while we're working on new dashboard developments. You can contact your customer success manager for more information.

#### How it works

This notebook will run you through all the necessary steps and will save a csv file locally or in your Google Drive.

You will need to fill in a few details and then hit the run button (▶) next to the code cells.

If you want to make changes to the code and save them, you will first need to save a copy of this notebook to your Google Drive.



**What you will need**

- VivaCity API login credentials
- Sensors and their zones you want to download data for 



ℹ  Note the notebook only works for zones that have [zonal speed](https://vivacitylabs.customerly.help/vivacity-dashboard/speed-feature) enabled


#### Output format


You will receive zonal speed measures in selected time buckets in the following format:


| Start Date | Start time | End time | zone_id |  zone_description | 	class |	mean_speed|	50perc_speed |	85perc_speed |	95perc_speed |	speed_variance | mean_occupancy |
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
| 2022-10-17 |00:00:00	| 00:15:00 |	2245 | S4 GrandDrive - Zone 1 (2245) | car	|19.9	|20.2|	22.2 |	23.6	|4.9 |	1.1|	



## Stage 1: Getting Started
Let's begin by importing the packages we'll need and creating some useful functions!

Hit the run button (▶) in the top left corner.

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to import functions and connect to Google Drive
import requests
import getpass
import json
from datetime import date, datetime, timedelta
import pytz
import pandas as pd
import csv
import time
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))
from ipywidgets import interact, interactive, fixed, interact_manual, Layout, Box
import ipywidgets as widgets
import warnings
warnings.filterwarnings('ignore')

def get_date_range(start_date, end_date):
    start_dates = []
    end_dates = []

    start_date = datetime.fromisoformat(start_date)
    end_date = datetime.fromisoformat(end_date)
    while True:
        start_dates.append(start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        end_dates.append((start_date+timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        start_date = start_date+timedelta(days=1)
        if start_date > end_date:
            break
    date_range = list(zip(start_dates, end_dates))
    return date_range

## Stage 2: Data Import
At the end of this process we will requested data from the API for the dates in the range that you have determined in the **Zone Details** step. 

The resulting JSON responses will then be converted into a data table in the **Data Processing** step.

Authentication is handled at this stage.

### Authentication
Now you will need your API login details, ie. a username and a password. If you don't have one, please contact your Customer Success Manager.

1.   Enter the username into the field on the right, then hit the run button (▶).
2.   Input the password in the box that appears below it and hit "enter" on your keyboard. 


In [None]:
#@title  { run: "auto", vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Insert your login credentials, then run, enter password + hit enter
username = "api-username" #@param {type:"string"}

auth_body = {}
auth_body['username'] = username
auth_body['password'] = getpass.getpass()

### Retrieve available sensors and zones

We'll now get an access token using the username and password set above and get all sensors and their zones the api user has access to.

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to get authorized access to the API

print("Authorising...")
auth_response = requests.post("https://api.vivacitylabs.com/get-token", data=auth_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
if auth_response.status_code == 401:
  print("\n!Error: Can't connect to the API. Check your username and password again.\nIf issues persists, ask your CSM to check with technical support if your user is setup correctly on the API\n")
else:
  headers = {}
  headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
  refresh_body = {}
  refresh_body['refresh_token'] = auth_response.json()['refresh_token']
  start = time.time()
  print("Done. Successfully retrieved access token.")

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to retrieve sensors and their zones available to you from the API

#get hardware meta data
print("\nRequesting metadata ...")
api_url_base = 'https://beta.api.vivacitylabs.com'
hardware_request = requests.get(f'{api_url_base}/hardware/metadata', headers=headers)
if hardware_request.status_code == 401:
  print("\n!Error: Can't access the data. Ask your CSM to check with technical support if your user is setup correctly on API 3\n")
hardware = hardware_request.json()

# Get hardware info
dict_hard = { "hardware_id" : [], "sensor_name" : [], "zone_id": [], "zone_name" : [], "zonal_speed" : [] }
for id in hardware:
  for lens in hardware[id]["view_points"]:
    for entity in hardware[id]["view_points"][lens]:
      #derive sensor name from first countline name
      if len(list(hardware[id]["view_points"][lens]["countlines"].keys())) == 0:
        cname = "Unknown"
      else:
        cid = list(hardware[id]["view_points"][lens]["countlines"].keys())[0]
        cname = hardware[id]["view_points"][lens]["countlines"][cid]['name']
      for zone_id in hardware[id]["view_points"][lens]["zones"]:
        dict_hard["hardware_id"].append(id)
        dict_hard["sensor_name"].append(cname)
        dict_hard["zone_id"].append(zone_id)
        dict_hard["zone_name"].append(hardware[id]["view_points"][lens]["zones"][zone_id]['name'])
        dict_hard["zonal_speed"].append(hardware[id]["view_points"][lens]["zones"][zone_id]['is_speed'])

#turn into dataframe and clean up
df_hard = pd.DataFrame.from_dict(dict_hard)
df_hard = df_hard[df_hard["zonal_speed"] == True].drop_duplicates().reset_index(drop=True)
df_hard["sensor_name"] = df_hard["sensor_name"].str.split("_")
for i in range(len(df_hard)):
  if len(df_hard["sensor_name"].iloc[i])>1:
    df_hard["sensor_name"].iloc[i] = df_hard["sensor_name"].iloc[i][0] + " " + df_hard["sensor_name"].iloc[i][1] 
  else:
    df_hard["sensor_name"].iloc[i] = df_hard["sensor_name"].iloc[i]

#create dropdown(
df_hard["zones_dropdown"] = (df_hard["sensor_name"].astype(str) + " - " + df_hard["zone_name"].astype(str)
                             + " (" +  df_hard["zone_id"].astype(str) + ")")
print("Done. Successfully retrieved sensor metadata.")

### Select zones and dates
Choose one or more zones, the classes you want data for, the date period and time bucket (15min, 1h, 24h). 

**Note:** The sensor name is derived from countline names (eg. countline name: S4_harleyRd_crossing => extracted sensor name: S4_harleyRd). Sometimes countlines are not named consistently resulting in odd sensor names. 

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this and then make your selection below
box_layout = Layout(display='flex', flex_flow='column', align_items='stretch', border=None, width='28%')

start_date_input = widgets.DatePicker(description="Start date",layout=Layout(width='55%'))
end_date_input = widgets.DatePicker(description="End date",layout=Layout(width='55%'))
timezone = widgets.Dropdown(options=['Europe/London', "Europe/Berlin", "Australia/Sydney"],description="Timezone",layout=Layout(width='55%'))
timebucket_input = widgets.Dropdown(options=['15m', "1h", "24h"],description="Time bucket",layout=Layout(width='55%'))
zones_input = widgets.SelectMultiple(
    options=df_hard["zones_dropdown"].unique(),
    description='Zones',
    disabled=False,
    layout=Layout(width='auto', height='170px'))
class_input = widgets.SelectMultiple(
    options=[ "cyclist", "motorbike", "car", "pedestrian", "taxi", "van", "minibus", "bus", "rigid", "truck", "emergency_car", "emergency_van", "fire_engine", "escooter"],
    description='Class',  disabled=False,
    layout=Layout(width='55%', height='235px')
)

items = [start_date_input, end_date_input, timezone,timebucket_input, class_input, zones_input]
box = Box(children=items, layout=box_layout)
printmd("**Select date period and zones**")
printmd("Hold  `Ctrl`  to select multiple classes or zones")
print("")
box

### Getting the data


We now query Classified Counts data from the API. 

The output will tell you how many requests are made and what the progress is.


In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to set the API request parameters and check your selection again
params = {}
params['zone_ids'] = df_hard[df_hard["zones_dropdown"].isin(zones_input.value)]["zone_id"].to_list()
params['classes'] = list(class_input.value)
print(params['classes'])
params["time_bucket"] = timebucket_input.value
params['fill_nulls'] = "true"

#convert local datetime to UTC datetime
start_date_utc = str(pd.to_datetime(start_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))
end_date_utc = str(pd.to_datetime(end_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))

#check if dates are in correct order
if start_date_input.value > end_date_input.value:
  print("Start date is after end date, please correct your date selection")
else:
  date_range = get_date_range(start_date_utc, end_date_utc)
  printmd("**Check your selection:**\n")
  print("Dates:", start_date_input.value, "to", end_date_input.value, "\nClass:", list(class_input.value), 
        "\nZones:", list(zones_input.value) , "\nTimebucket:", timebucket_input.value)

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to request data from the API (can take a bit)

data = []
for i,date in enumerate(date_range):
  time_elapsed = (time.time() - start)
  if time_elapsed > 500:
    print("Reauthorising...")
    auth_response = requests.post("https://api.vivacitylabs.com/refresh-token", data=refresh_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
    headers = {}
    headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
    refresh_body = {}
    refresh_body['refresh_token'] = auth_response.json()['refresh_token']
    start = time.time()
    print("Done.")
  params["from"] = date[0]  
  params["to"] = date[1] 
  response = requests.get('https://beta.api.vivacitylabs.com/zone/speeds', params=params, headers=headers)
  print(str(i+1) + "/" + str(len(date_range)) + ": " + str(response.status_code) + " " + response.reason)
  if response.status_code is 200:
    zspeed_json = response.json()
    zspeed_dict = {"zone_id": [], "from" : [], "to" : [], "class" : [], "mean_speed" : [], "50perc_speed" : [],
           "85perc_speed" : [], "95perc_speed" : [], "speed_variance" : [], "mean_occupancy" : []
             }

    for zone, items in zspeed_json.items():
            for time_bucket in items:
                for _class in time_bucket['zone_speeds'].keys():
                    zspeed_dict["zone_id"].append(zone)
                    zspeed_dict["from"].append(datetime.strptime(time_bucket['from'],'%Y-%m-%dT%H:%M:%S.000Z'))
                    zspeed_dict["to"].append(datetime.strptime(time_bucket['to'],'%Y-%m-%dT%H:%M:%S.000Z'))
                    zspeed_dict["class"].append(_class)
                    zspeed_dict["mean_speed"].append(time_bucket['zone_speeds'][_class]["mean_speed"])
                    zspeed_dict["50perc_speed"].append(time_bucket['zone_speeds'][_class]["50_percentile_speed"])
                    zspeed_dict["85perc_speed"].append(time_bucket['zone_speeds'][_class]["85_percentile_speed"])
                    zspeed_dict["95perc_speed"].append(time_bucket['zone_speeds'][_class]["95_percentile_speed"])
                    zspeed_dict["speed_variance"].append(time_bucket['zone_speeds'][_class]["speed_variance"])
                    zspeed_dict["mean_occupancy"].append(time_bucket['zone_speeds'][_class]["mean_occupancy"])
    zspeed_df = pd.DataFrame.from_dict(zspeed_dict)
    data.append(zspeed_df)

  else:
    print("Data missing for " + params["timeFrom"].split("T")[0] + " to " + params["timeTo"].split("T")[0])
  time.sleep(1)

data = pd.concat(data, axis=0, ignore_index=True)  

## Stage 3: Data Processing
Now we process the raw data output and calculate the average speeds per time bucket. 

Select a unit you want the speed to be processed as (mph, km/h or m/s).

In [None]:
#@title  { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Select a speed unit and hit run to process the data

unit = "mph" #@param [ "mph", "km/h", "m/s"]

#convert back to local datetime
data["Date"] = [ str(pd.to_datetime(i).tz_localize('utc').astimezone(timezone.value).date()) for i in data["from"]]
data["Start time"] = [ str(pd.to_datetime(i).tz_localize('utc').astimezone(timezone.value).time()) for i in data["from"]]
data["End time"] = [ str(pd.to_datetime(i).tz_localize('utc').astimezone(timezone.value).time()) for i in data["to"]]

#get factor
if unit == "mph":
  factor = 2.236936
elif unit == "km/h":
  factor = 3.6
else:
  factor = 1.0

#apply factor
data[['mean_speed', '50perc_speed','85perc_speed', '95perc_speed', 'speed_variance']] = data[['mean_speed', '50perc_speed','85perc_speed', '95perc_speed', 'speed_variance']].multiply(factor, axis="index")

#merge in zone name
data = pd.merge(data, df_hard[["zone_id", "zones_dropdown"]], left_on="zone_id", right_on="zone_id", how='left')

#rename and reorder columns
data = data.rename(columns={"zones_dropdown":"zone_description"})
data = data[['Date','Start time', 'End time', 'zone_id', 'zone_description', 'class', 'mean_speed', '50perc_speed','85perc_speed', '95perc_speed', 'speed_variance', 'mean_occupancy']]

## Stage 4: Data Export
Now let's write this to a .csv file. You can either save the file locally (it will show in your Downloads folder) or save it to a Google Drive.


* **Local Downloads Folder:** This might not work if your browser or computer blocking downloads.
* **Google Drive:** If you want to save it in Google Drive, you will be asked for permission to connect to your Google Account.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown Select where to save the csv file
download_location = "Local folder" #@param [ "Local folder", "Google Drive"]
#@markdown Name your file
filename = "zonal-speeds-test" #@param {type:"string"}
#@markdown Hit run (>)

if download_location == "Local folder":
  from google.colab import files
  data.to_csv(filename + ".csv", index = False)
  files.download(filename + ".csv")
else:
  from google.colab import drive
  drive.mount('/content/drive')
  path = '/content/drive/My Drive/'
  data.to_csv(path + filename +".csv", index = False)