<a href="https://colab.research.google.com/github/vivacitylabs/data-toolkit/blob/master/countline_speeds_bulk_download_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Countline Speeds - Bulk Download Generator



## Generate a csv file of countline speed data over multiple days

This notebook is a tool to access VivaCity data via the API. It is aimed as an **interim solution** while we're working on new dashboard developments. You can contact your customer success manager for more information.

#### How it works

This notebook will run you through all the necessary steps and will save a csv file in your Google Drive.

You will need to fill in a few details and then hit the run button (▶) next to the code cells.

If you want to make changes to the code and save them, you will first need to save a copy of this notebook to your Google Drive.



**What you will need**

- Google account
- VivaCity API login credentials
- Countline ids you want to download data for 



ℹ  Note the notebook only works for countlines that have [countline speed](https://vivacitylabs.customerly.help/vivacity-dashboard/countline-speed) enabled


#### Output format

You will receive the mean countline speed in 15min time buckets in the following format:



| Start Date | Start time | End time |Countline |  Direction |	Car (mph)|	Taxi (mph)|	LGV (mph)|	Bus (mph)|	OGV (mph)|	Motorbike (mph) | Cyclist (mph) |Pedestrian (mph)| Car count | Taxi count | LGV count | Bus count|OGV count | Motorbike count | Cyclist count | Pedestrian count | 
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
| 02/03/2022 |10:30:00	| 10:45:00|	40284 | In | 14.57616361	|n/a	|13.421616|	n/a|	6.710808	|11.18468|	n/a|	n/a | 5 | 0  |3| 0 | 2 | 7| 0 | 0 |



## Stage 1: Getting Started
Let's begin by importing the packages we'll need and creating some useful functions!

Hit the run button (▶) in the top left corner.

In [None]:
#@title
import requests
import getpass
import json
from datetime import date, datetime, timedelta
import pytz
import pandas as pd
import csv
import time
from google.colab import drive
drive.mount('/content/drive')

def get_date_range(start_date, end_date):
    start_dates = []
    end_dates = []

    start_date = datetime.fromisoformat(start_date)
    end_date = datetime.fromisoformat(end_date)
    while True:
        start_dates.append(start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        end_dates.append((start_date+timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        start_date = start_date+timedelta(days=1)
        if start_date > end_date:
            break
    date_range = list(zip(start_dates, end_dates))
    return date_range

## Stage 2: Data Import
At the end of this process we will requested data from the API for the dates in the range that you have determined in the **Countline Details** step. 

The resulting JSON responses will then be converted into a data table in the **Data Processing** step.

Authentication is handled at this stage.

### Countline Details
Choose one or more coutlines (add multiple countline ids separated by a comma), the classes you want data for and the date. 

Other parameters are set by default (eg. time buckets, speed buckets).

In [None]:
#@title { run: "auto", vertical-output: true ,  display-mode: "form" }
countline_ids = "23248" #@param {type:"string"}
classes = "car,taxi,van,bus,rigid,motorbike,cyclist,pedestrian" #@param {type:"string"}
start_date = "2022-07-18" #@param {type:"date"}
end_date = "2022-07-22" #@param {type:"date"}
timezone = "Europe/London" #@param ['Europe/London', "Europe/Berlin", "Australia/Sydney"]

#convert local datetime to UTC datetime
start_date_utc = str(pd.to_datetime(start_date).tz_localize(timezone).astimezone(pytz.utc))
end_date_utc = str(pd.to_datetime(end_date).tz_localize(timezone).astimezone(pytz.utc))

params = {}
params['countline_ids'] = countline_ids
params['classes'] = classes
date_range = get_date_range(start_date_utc, end_date_utc)
params['time_bucket'] = "15m"
params['speed_bucket_number'] = "98"
params['max_speed'] = "44"
params['min_speed'] = "0"
params['fill_zeros'] = "true"

### Authentication
Now you will need your API login details, ie. a username and a password. If you don't have one, please contact your Customer Success Manager.

1.   Enter the username into the field on the right, then hit the run button (▶).
2.   Input the password in the box that appears below it and hit "enter" on your keyboard. 


In [None]:
#@title Insert your login credentials { run: "auto", vertical-output: true, display-mode: "form" }
username = "api-username" #@param {type:"string"}

auth_body = {}
auth_body['username'] = username
auth_body['password'] = getpass.getpass()

### Getting the data


Now that we have set the user logins, we will try to get authorised access to the API. If unsuccesful, you will receive an error message at the Authorising stage.

We will then request countline speed data for the parameters you set under **Countline Details** (date range, countline ids, classes) 

The output will tell you how many requests are made and what the progress is.

In [None]:
print("Authorising...")
auth_response = requests.post("https://api.vivacitylabs.com/get-token", data=auth_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
headers = {}
headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
refresh_body = {}
refresh_body['refresh_token'] = auth_response.json()['refresh_token']
start = time.time()
print("Done.")
data = []
for i,date in enumerate(date_range):
  time_elapsed = (time.time() - start)
  if time_elapsed > 500:
    print("Reauthorising...")
    auth_response = requests.post("https://api.vivacitylabs.com/refresh-token", data=refresh_body, headers={'Content-Type':'application/x-www-form-urlencoded'})
    headers = {}
    headers['Authorization'] = "Bearer " + auth_response.json()['access_token']
    refresh_body = {}
    refresh_body['refresh_token'] = auth_response.json()['refresh_token']
    start = time.time()
    print("Done.")
  params["from"] = date[0]  
  params["to"] = date[1] 
  response = requests.get("https://beta.api.vivacitylabs.com/countline/speed", params=params, headers=headers)
  print(str(i+1) + "/" + str(len(date_range)) + ": " + str(response.status_code) + " " + response.reason)
  data.append(response.json())
  time.sleep(1)

## Stage 3: Data Processing
Now we process the raw data output and calculate the average speeds per time bucket. 

Select a unit you want the speed to be processed as (mph, km/h or m/s).

In [None]:
#@title  { vertical-output: false, display-mode: "form" }
#@markdown Select a speed unit

unit = "mph" #@param [ "mph", "km/h", "m/s"]

In [None]:
#get factor
if unit == "mph":
  factor = 2.236936
elif unit == "km/h":
  factor = 3.6
else:
  factor = 1

#process data
export = []
for date_data in data:
  day_export = []
  for countline, countline_data in date_data.items():
    direction_data = {}
    for time_data in countline_data:
      start_time = datetime.strptime(time_data['from'],'%Y-%m-%dT%H:%M:%S.000Z')
      end_time = datetime.strptime(time_data['to'],'%Y-%m-%dT%H:%M:%S.000Z')
      direction_data['out'] = time_data['anti_clockwise']
      direction_data['in'] = time_data['clockwise']
      counts = {}
      averages = {}
      for direction_label, class_data in direction_data.items():
        counts[direction_label] = {}
        averages[direction_label] = {}
        for class_label, speed_data in direction_data[direction_label].items():
          count_total = 0
          speed_total = 0
          for speed, count in speed_data.items():
            count_total += count
            speed_total += count*float(speed)
            if count_total != 0:
              counts[direction_label][class_label] = str(count_total)
              averages[direction_label][class_label] = speed_total* factor /count_total
            else:
              counts[direction_label][class_label] = '0'
              averages[direction_label][class_label] = 'n/a'
        row = [#convert dates back to local datetime from UTC
              str(pd.to_datetime(start_time).tz_localize('utc').astimezone(timezone).date()),
              str(pd.to_datetime(start_time).tz_localize('utc').astimezone(timezone).time()),
              str(pd.to_datetime(end_time).tz_localize('utc').astimezone(timezone).time()),
              countline,
              direction_label
              ]
        #add averages and counts for all selected classes
        for selected_class in classes.split(","):
          row.append(averages[direction_label][selected_class])
        for selected_class in classes.split(","):
          row.append(counts[direction_label][selected_class])
        day_export.append(row)
  export += day_export

## Stage 4: Data Export
Now let's write this to a `.csv` file and save it to the parent folder (`My drive`) in Google Drive. 

Fill in the `filename` to the right and then hit run (▶). It could take a few minutes until it will show in your Google Drive.

In [None]:
#@title Set the output filename { vertical-output: true, display-mode: "form" }
filename = "speed-test_2022-10-07" #@param {type:"string"}

path = '/content/drive/My Drive/'

header = ['date',	'start_time',	'end_time', 'countline_id', 'direction']
for selected_class in classes.split(","):
  header.append(selected_class + " (mph)")
for selected_class in classes.split(","):
  header.append(selected_class + " count")
with open(path + filename + '.csv', 'w') as f:
  writer = csv.writer(f)
  writer.writerow(header)
  writer.writerows(export)