<a href="https://colab.research.google.com/github/vivacitylabs/data-toolkit/blob/master/notebooks/countline_speeds_bulk_download_generator_v3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Countline Speeds - Bulk Download Generator V3


## Generate a csv file of countline speed data over multiple days

This notebook only works for countlines that have countline speeds enabled.

#### How to do it

This notebook will run you through all the necessary steps and will save the output csv file locally or in your Google Drive.

You will simply need to fill in a few details and then hit the run button next to the code cells.

What you will need:

- VivaCity API login credentials


#### Output format

You will receive data on countline speed in the following format:

Beware that direction is given as In/Out but also aggregated (Both). If numbers are added up without filtering, then volumes will be doubled.

Speed values (mean, 85th percentile, speed buckets) are provided in the chosen unit (mph, km/h or m/s)



| Countline | Start Date | Time from | Time to |Class | Direction |	total_counts |	mean |	85percentile | 0-5 |	5-10 |	10-15|	15-20 | 20-25 |25-30| 30-35 | 35-40 | 40-45 | 45-50 |50-55 |
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
| 40284| 02/03/2022 |10:00:00	| 11:00:00|	Car| In |  10	| 24.576	|26.842 | 0|0|2	|5|	2|	1 | 0 | 0  |0| 0 |0 |






## Stage 1: Getting Started
Let's begin by importing the packages we'll need and creating some useful functions!

Hit the run button in the top left corner.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this to import functions
import requests
import getpass
import json
from datetime import date, datetime, timedelta
import pytz
import pandas as pd
import numpy as np
import csv
import time
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))
from ipywidgets import interact, interactive, fixed, interact_manual, Layout, Box
import ipywidgets as widgets

def get_date_range(start_date, end_date):
    start_dates = []
    end_dates = []

    start_date = datetime.fromisoformat(start_date)
    end_date = datetime.fromisoformat(end_date)
    while True:
        start_dates.append(start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        end_dates.append((start_date+timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        start_date = start_date+timedelta(days=1)
        if start_date > end_date:
            break
    date_range = list(zip(start_dates, end_dates))
    return date_range

## Stage 2: Data Import
First, we'll input the api username and password. Contact customer support (support@vivacitylabs.com) if you don't have these details.

We will then request all countlines the user has access to. If some countlines are missing, get in touch with us.

Finally, you will select the date period, countlines and classes to request data for.

### Authentication
Now you will need your API login details, ie. a username and a password.If you don't have one, please contact contact customer support (support@vivacitylabs.com).

1.   Enter the username into the field on the right, then hit the run button (▶).
2.   Input the password in the box that appears below it and hit "enter" on your keyboard.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown Insert your login credentials
username = "api-username" #@param {type:"string"}

auth_body = {}
auth_body['username'] = username
auth_body['password'] = getpass.getpass()

#### Available Countlines

Get access token using our username and password and get countlines user has access to.

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to get authorized access to the API

print("Authorising...")
auth_response = requests.post("https://api.vivacitylabs.com/get-token", data=auth_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
if auth_response.status_code == 401:
  print("\n!Error: Can't connect to the API. Check your username and password again.\nIf issues persists, ask customer support if your user is setup correctly on the API\n")
else:
  headers = {}
  headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
  refresh_body = {}
  refresh_body['refresh_token'] = auth_response.json()['refresh_token']
  start = time.time()
  print("Done. Successfully retrieved access token.")

In [None]:
#@title { vertical-output: false, display-mode: "form" }
#@markdown **Code cell:** Run this to retrieve sensors and countlines available to you from the API

#get hardware meta data
print("\nRequesting metadata ...")
api_url_base = 'https://beta.api.vivacitylabs.com'
hardware_request = requests.get(f'{api_url_base}/hardware/metadata', headers=headers)
if hardware_request.status_code == 401:
  print("\n!Error: Can't access the data. Ask customer support if your user is setup correctly on API 3\n")
hardware = hardware_request.json()

# Get hardware info
dict_hard = { "hardware_id" : [], "countline_id" : [], "countline_name" : [] }
for id in hardware:
  for lens in hardware[id]["view_points"]:
    for entity in hardware[id]["view_points"][lens]:
      for countline_id in hardware[id]["view_points"][lens]["countlines"]:
        dict_hard["hardware_id"].append(id)
        dict_hard["countline_id"].append(countline_id)
        dict_hard["countline_name"].append(hardware[id]["view_points"][lens]["countlines"][countline_id]['name'])

#turn into dataframe and clean up
df_hard = pd.DataFrame.from_dict(dict_hard)
df_hard["sensor_name"] = df_hard["countline_name"]
df_hard["countline_name_display"] = df_hard["countline_name"] + " (" + df_hard["countline_id"] + ")"
for i in range(len(df_hard)):
  if len(df_hard["sensor_name"].iloc[i])>1:
    df_hard["sensor_name"].iloc[i] = df_hard["sensor_name"].iloc[i][0] + "" + df_hard["sensor_name"].iloc[i][1]
  else:
    df_hard["sensor_name"].iloc[i] = df_hard["sensor_name"].iloc[i]
df_hard = df_hard.drop_duplicates()
print(len(df_hard["countline_id"].unique()), " countlines available")

#### Select countlines and date range for querying the API


After running the code below you can select class and countlines from the dropdown. Also select the start and end dates. Ensure that the start date is before the end date.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this and then  make your selections

#@markdown Note: max speed values cannot be above 160 (for kmh), 100 (for mph) and 44 (for m/s)

box_layout = Layout(display='flex', flex_flow='column',
                    align_items='stretch', border=None, width='28%')

start_date_input = widgets.DatePicker(description="Start date",layout=Layout(width='55%'))
end_date_input = widgets.DatePicker(description="End date",layout=Layout(width='55%'))
timezone = widgets.Dropdown(options=['Europe/London', "Europe/Berlin", "Australia/Sydney"],description="Timezone",layout=Layout(width='55%'))
timebucket_input = widgets.Dropdown(options=['15m', "1h", "24h"],description="Time bucket",layout=Layout(width='55%'))
unit_input = widgets.Dropdown(options=["km/h",'mph', "m/s"],description="Speed unit",layout=Layout(width='55%'))
speed_bucket_input = widgets.Dropdown(options=[1,5,10],description="Speed bucket",layout=Layout(width='55%'))
max_speed_input = widgets.IntText(value=80, description="Max speed", layout=Layout(width='55%'))

class_input = widgets.SelectMultiple(
    options=[ "cyclist", "motorbike", "car", "pedestrian", "taxi", "van", "minibus", "bus", "rigid", "truck", "emergency_car", "emergency_van", "fire_engine", "escooter"],
    description='Class',  disabled=False,
    layout=Layout(width='55%', height='230px')
)
countlines_input = widgets.SelectMultiple(
    options=df_hard["countline_name_display"].sort_values().unique(),
    description='Countlines',
    disabled=False,
    layout=Layout(width='auto', height='200px')
)
items = [start_date_input, end_date_input, timezone, timebucket_input, unit_input, speed_bucket_input, max_speed_input, class_input,countlines_input]
box = Box(children=items, layout=box_layout)
printmd("**Select date period and countlines**")
printmd("Hold  `Ctrl + Shift`  to select multiple classes or countlines")
print("")
box

Run the cell below to set the input parameters for the API request. Check that they look alright

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this and check your selection again.
params = {}
params['countline_ids'] = df_hard[df_hard["countline_name_display"].isin(countlines_input.value)]["countline_id"].to_list()
params['classes'] = list(class_input.value)
params['time_bucket'] = timebucket_input.value

#set fixed parameters
max_speed_calc = int(np.ceil(max_speed_input.value/speed_bucket_input.value)*speed_bucket_input.value)
params['max_speed'] = str(max_speed_calc)
params['min_speed'] = "0"
params['speed_bucket_number'] = str(int(max_speed_calc/speed_bucket_input.value))
params['fill_zeros'] = "true"
params['units'] = unit_input.value

#convert local datetime to UTC datetime
start_date_utc = str(pd.to_datetime(start_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))
end_date_utc = str(pd.to_datetime(end_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))

#check if dates are in correct order
if start_date_input.value > end_date_input.value:
  print("Start date is after end date, please correct your date selection")
else:
  date_range = get_date_range(start_date_utc, end_date_utc)
  printmd("**Check your selection:**\n")
  print("Dates:", start_date_input.value,
        "to", end_date_input.value,
        "\nTimebucket:", timebucket_input.value,
        "\nSpeed unit:", unit_input.value ,
         "\nMax speed:", max_speed_input.value ,
         "\nSpeed bucket size:", speed_bucket_input.value ,
        "\nClass:", class_input.value,
        "\nCountlines:", countlines_input.value )

#### Getting the data

Here's the API call to get the countline speed data.

The output will tell you how many requests are made and what the progress is.

In [None]:
#@title  {vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this to get data from the API

#request data
print("Requesting data ...")
df_request_all = []
for i,date in enumerate(date_range):
  time_elapsed = (time.time() - start)
  if time_elapsed > 500:
    print("Reauthorising...")
    auth_response = requests.post("https://api.vivacitylabs.com/refresh-token", data=refresh_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
    headers = {}
    headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
    refresh_body = {}
    refresh_body['refresh_token'] = auth_response.json()['refresh_token']
    start = time.time()
    print("Done. Got new access token")
  params["from"] = date[0]
  params["to"] = date[1]
  response = requests.get("https://beta.api.vivacitylabs.com/countline/speed", params=params, headers=headers)
  response_json = response.json()

  # TODO: Add Try-Error for bad API requests
  #from json import JSONDecodeError
  #except JSONDecodeError:
    #print("no data for",date[0], "to", date[1])

  #turn into dataframe
  df_request = {"countline_id" : [], "date":[], "timefrom": [], "timeto": [], "classes": [],
                "direction": [], "mean":[], "85percentile":[], "speed_buckets":[]}
  for countline in response_json:
      for buckets in response_json[countline]:
          for _class in buckets["clockwise"].keys():
              df_request["countline_id"].append(countline)
              df_request["date"].append(pd.to_datetime(buckets["from"]).tz_convert(timezone.value).date())
              df_request["timefrom"].append(pd.to_datetime(buckets["from"]).tz_convert(timezone.value).time())
              df_request["timeto"].append(pd.to_datetime(buckets["to"]).tz_convert(timezone.value).time())
              df_request["classes"].append(_class)
              df_request["direction"].append("In")
              df_request["mean"].append(buckets["clockwise"][_class]["mean"])
              df_request["85percentile"].append(buckets["clockwise"][_class]["p85"])
              df_request["speed_buckets"].append(buckets["clockwise"][_class]["speed_bins"])
          for _class in buckets["anti_clockwise"].keys():
              df_request["countline_id"].append(countline)
              df_request["date"].append(pd.to_datetime(buckets["from"]).tz_convert(timezone.value).date())
              df_request["timefrom"].append(pd.to_datetime(buckets["from"]).tz_convert(timezone.value).time())
              df_request["timeto"].append(pd.to_datetime(buckets["to"]).tz_convert(timezone.value).time())
              df_request["classes"].append(_class)
              df_request["direction"].append("Out")
              df_request["mean"].append(buckets["anti_clockwise"][_class]["mean"])
              df_request["85percentile"].append(buckets["anti_clockwise"][_class]["p85"])
              df_request["speed_buckets"].append(buckets["anti_clockwise"][_class]["speed_bins"])
  df_request = pd.DataFrame.from_dict(df_request)
  df_request_all.append(df_request)   #append all dataframes

  #print progress
  print(str(i+1) + "/" + str(len(date_range)) + ": " + str(response.status_code) + " " + response.reason)
  time.sleep(1)

print("Converting into dataframe ...")
#return single dataframe
df_request_all = pd.concat(df_request_all, axis=0).reset_index(drop=True)
df_request_all = df_request_all.join(pd.json_normalize(df_request_all['speed_buckets'])) #expand data to columns
df_request_all = df_request_all.drop(columns="speed_buckets")

#error when no data returned
if len(df_request_all)==0:
  print("Note: No data returned for set parameters (date, countlines, classes). Please run a different request.")
else:
  print("Done")

## Stage 3: Data Processing
Now we process the raw data output and calculate the average speeds per time bucket as well as counts in speed bins.

You can chose to collapse detailed classes into 8 road user groups as displayed on the dashboard. Depending on the size of your data, this step may take a while

In [None]:
#@title { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Select if you want to collapse classes and run to clean data for export.
collapse_classes = "yes" #@param [ "yes", "no"]
df_speed = df_request_all.copy()

print("Processing ...")

# get speed bucket columns
bucket_columns = df_speed.columns[8:]

# calculate total counts and summed mean/85percentile
df_speed["total_counts"] = df_speed[bucket_columns].sum(axis=1)
df_speed["mean_summed"] = df_speed["mean"] * df_speed["total_counts"]
df_speed["85percentile_summed"] = df_speed["85percentile"] * df_speed["total_counts"]

# dictionary to hold method of aggregation
agg_dict = {"mean_summed": "sum", "85percentile_summed": "sum", "total_counts": "sum"}
for bucket_column in bucket_columns:
  agg_dict[bucket_column] = "sum"

#create new column with collapsed classes and sum data
collapsed_classes = {"pedestrian":"Pedestrian","cyclist":"Cyclist","e-scooter":"Pedestrian", "motorbike":"Motorbike","car":"Car","taxi":"Car",
                     "emergency_car":"Car","van":"LGV","emergency_van":"LGV",
                     "bus":"Bus","minibus":"Bus","rigid":"OGV1","fire_engine":"OGV1","truck":"OGV2",
                     "total":"Total",}
if collapse_classes == "yes":
  df_speed["class"] = df_speed["classes"].map(collapsed_classes)
  df_speed = df_speed.groupby(
      ['countline_id', 'date', 'timefrom', 'timeto', 'class', 'direction']).agg(agg_dict).reset_index()
else:
  df_speed = df_speed.rename(columns={"classes":"class"})
  df_speed = df_speed.drop(columns=["mean", "85percentile"])

# add values for both directions and add to dataframe
df_speed_dir = df_speed.groupby(
    ['countline_id','date', 'timefrom', 'timeto', 'class']).agg(agg_dict).reset_index()
df_speed_dir["direction"] =  "Both"
df_speed = pd.concat(
    [df_speed,df_speed_dir], axis=0).sort_values(
        by=['countline_id', 'date', 'timefrom', 'timeto', 'class']).reset_index(drop=True)

# convert summed aggregates back to average
df_speed["mean_summed"] = np.round(df_speed["mean_summed"] / df_speed["total_counts"],3)
df_speed["85percentile_summed"] = np.round(df_speed["85percentile_summed"] / df_speed["total_counts"],3)
df_speed = df_speed.rename(columns={"mean_summed": "mean", "85percentile_summed": "85percentile"})

print("Cleaning up ...")

# order speed bucket columns by value
colname_dict = {}
for bucket_column in bucket_columns:
  colname_dict[bucket_column] = int(float(bucket_column))
df_speed = df_speed.rename(columns=colname_dict)
column_order = ['countline_id', 'date', 'timefrom', 'timeto', 'class', 'direction','total_counts', 'mean', '85percentile'] + sorted(colname_dict.values())
df_speed = df_speed[column_order]

# rename speed bucket columns
colname_dict_final = {}
for bucket_column in sorted(colname_dict.values()):
  colname_dict_final[bucket_column] = str(bucket_column)+ "-"+ str(bucket_column + speed_bucket_input.value)
df_speed = df_speed.rename(columns=colname_dict_final)

print("Done. File ready for export.")

## Stage 4: Data Export
Now let's write this to a .csv file. You can either save the file locally (it will show in your Downloads folder) or save it to a Google Drive.


* **Local Downloads Folder:** This might not work if your browser or computer blocking downloads.
* **Google Drive:** If you want to save it in Google Drive, you will be asked for permission to connect to your Google Account.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown Select where to save the csv file
download_location = "Local folder" #@param [ "Local folder", "Google Drive"]
#@markdown Name your file
filename = "countline-speeds" #@param {type:"string"}
#@markdown Hit run (>)
df_speed_final = df_speed.copy()

if download_location == "Local folder":
  from google.colab import files
  df_speed_final.to_csv(filename + ".csv", index = False)
  files.download(filename + ".csv")
else:
  from google.colab import drive
  drive.mount('/content/drive')
  path = '/content/drive/My Drive/'
  df_speed_final.to_csv(path + filename +".csv", index = False)