<a href="https://colab.research.google.com/github/vivacitylabs/data-toolkit/blob/master/notebooks/classified_counts_bulk_download_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Classified Counts - Bulk Download Generator



## Generate a csv file of Classified Counts data over multiple days

This notebook is a tool to access VivaCity data via the API. It is aimed as an **interim solution** while we're working on new dashboard developments. You can contact customer support if you have any issues (support@vivacitylabs.com) or raise a ticked on the [Customer Help Portal](https://vivacitylabs.atlassian.net/servicedesk/customer/portal/16).

Use this notebook only if you need one of the T14 classes that can't be downloaded via the dashboard.

#### How it works

This notebook will run you through all the necessary steps and will save the output csv file locally or in your Google Drive.

You will simply need to fill in a few details and then hit the run button next to the code cells.

**What you will need**

- VivaCity API login credentials
- Countline ids you want to download data for


#### Output format

The output file will look something like this.

| Local Datetime |  CountlineId | countlineName | direction |  car |	cyclist | motorbike | taxi |
|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|	2023-02-20 00:00:00 | 53241	| S1_highstreet | in |	8 | 3 | 1	| 0 |




## Stage 1: Getting Started
Let's begin by importing the packages we'll need and creating some useful functions!

Hit the run button (▶) in the top left corner.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this to import functions
import requests
import getpass
import json
import pandas as pd
from datetime import date, datetime, timedelta
import csv
import time
import pytz
from IPython.display import Markdown, display
from tqdm import tqdm
from collections import deque
def printmd(string):
    display(Markdown(string))
from ipywidgets import interact, interactive, fixed, interact_manual, Layout, Box
import ipywidgets as widgets

def get_date_range(start_date, end_date):
    start_dates = []
    end_dates = []

    start_date = datetime.fromisoformat(start_date)
    end_date = datetime.fromisoformat(end_date)
    while True:
        start_dates.append(start_date.strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        end_dates.append((start_date+timedelta(days=1)).strftime('%Y-%m-%dT%H:%M:%S.000Z'))
        start_date = start_date+timedelta(days=1)
        if start_date > end_date:
            break
    date_range = list(zip(start_dates, end_dates))
    return date_range

## Stage 2: Data Import

First, we'll input the api username and password. Contact customer support (support@vivacitylabs.com) if you don't have these details.

We will then request all countlines the user has access to. If some countlines are missing, get in touch with us.

Finally, you will select the date period, countlines and classes to request data for.

### Authentication
Now you will need your API login details, ie. a username and a password. If you don't have one, please contact contact customer support (support@vivacitylabs.com).

1.   Input your API Key into the box.
1.   Hit the run button (▶).


In [None]:
#@title  { run: "auto", vertical-output: true, display-mode: "form" }
#@markdown Insert your login credentials
api_key = "" #@param {type:"string"}

#### Available Countlines

Get access token using our username and password and get all countlines user has access to.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this to retrieve all countlines

headers = {}
headers['x-vivacity-api-key'] = api_key

#get sensor meta data
print("\nRequesting metadata ...")
api_url_base = 'https://api.vivacitylabs.com'
countline_request = requests.get(f'{api_url_base}/countline/metadata', headers=headers)
countlines = countline_request.json()

#convert to dataframe
dict_countlines = {"countline_id": [], "countline_name": [], "countline_direction": []}
for id in countlines:
  dict_countlines["countline_id"].append(id)
  dict_countlines["countline_name"].append(countlines[id]["name"])
  dict_countlines["countline_direction"].append(countlines[id]["direction"])
df_countlines = pd.DataFrame.from_dict(dict_countlines)
print(len(df_countlines["countline_id"].unique()), " countlines available")

#### Select countlines and date range for querying the API


After running the code below you can select class and countlines from the dropdown. Also select the start and end dates. Ensure that the start date is before the end date.  

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this and then  make your selections

box_layout = Layout(display='flex', flex_flow='column',
                    align_items='stretch', border=None, width='28%')

start_date_input = widgets.DatePicker(description="Start date",layout=Layout(width='55%'))
end_date_input = widgets.DatePicker(description="End date",layout=Layout(width='55%'))
timezone = widgets.Dropdown(options=['Europe/London', "Europe/Berlin", "America/New_York",
                                     "Australia/Adelaide","Australia/Brisbane", "Australia/Darwin",
                                     "Australia/Melbourne", "Australia/Perth", "Australia/Sydney"
                                     ],description="Timezone",layout=Layout(width='55%'))

class_input = widgets.SelectMultiple(
    options=[ "cyclist", "motorbike", "car", "pedestrian", "taxi", "van", "minibus", "bus", "rigid", "truck", "emergency_car", "emergency_van", "fire_engine", "escooter"],
    description='Class',  disabled=False,
    layout=Layout(width='55%', height='230px')
)
countlines_input = widgets.SelectMultiple(
    options=df_countlines.sort_values("countline_name")["countline_name"].unique(),
    description='Countlines',
    disabled=False,
    layout=Layout(width='auto', height='200px')
)
items = [start_date_input, end_date_input, timezone, class_input,countlines_input]
box = Box(children=items, layout=box_layout)
printmd("**Select date period and countlines**")
printmd("Hold  `Ctrl + Shift`  to select multiple classes or countlines")
box

Run the cell below to set the input parameters for the API request. Check that they look alright.

In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this and check your selection again.

params = {}
params['countline_ids'] = df_countlines[df_countlines["countline_name"].isin(countlines_input.value)]["countline_id"].to_list()
params['class'] = list(class_input.value)
params["includeZeroCounts"] = True

#convert local datetime to UTC datetime
start_date_utc = str(pd.to_datetime(start_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))
end_date_utc = str(pd.to_datetime(end_date_input.value).tz_localize(timezone.value).astimezone(pytz.utc))

#check if dates are in correct order
if start_date_input.value > end_date_input.value:
  print("Start date is after end date, please correct your date selection")
else:
  date_range = get_date_range(start_date_utc, end_date_utc)
  printmd("**Check your selection:**\n")
  print("Dates:", start_date_input.value, "to", end_date_input.value, "\nClass:", class_input.value, "\nCountlines:", countlines_input.value )

### Getting the data

We now query Classified Counts data from the API.

The output will tell you how many requests are made and what the progress is.

In [None]:
#@title  {vertical-output: true, display-mode: "form" }
#@markdown **Code cell:** Run this to get data from the API

max_requests_per_minute=290
data = []
start_time = time.time()

# Initialize request tracking
request_times = deque(maxlen=max_requests_per_minute)
min_request_interval = 60 / max_requests_per_minute  # seconds between requests

def wait_for_rate_limit():
    """Ensure we don't exceed max_requests_per_minute"""
    if len(request_times) >= max_requests_per_minute:
        # Calculate time since oldest request
        elapsed = time.time() - request_times[0]
        if elapsed < 60:  # If less than a minute has passed
            sleep_time = 60 - elapsed
            time.sleep(sleep_time)
        # Clear old requests outside the 1-minute window
        while request_times and time.time() - request_times[0] > 60:
            request_times.popleft()

    # Ensure minimum interval between requests
    if request_times:
        time_since_last = time.time() - request_times[-1]
        if time_since_last < min_request_interval:
            time.sleep(min_request_interval - time_since_last)

# Use tqdm for progress tracking
for i, date in enumerate(tqdm(date_range, desc="Collecting data")):
    # Wait for rate limit before making request
    wait_for_rate_limit()

    # Update parameters for this request
    current_params = params.copy()
    current_params["from"] = date[0]
    current_params["to"] = date[1]
    current_params["time_bucket"] = "15m"
    current_params["vivacityNotebookSource"] = "counts"

    try:
        # Record request time
        request_times.append(time.time())

        response = requests.get(
            'https://api.vivacitylabs.com/countline/counts',
            params=current_params,
            headers=headers,
            timeout=30
        )

        # Log progress
        print(f"{i+1}/{len(date_range)}: {response.status_code} {response.reason}")

        if response.status_code == 200:
            json_counts = response.json()

            df_request = {
                "countline_id": [],
                "timefrom": [],
                "timeto": [],
                "classes": [],
                "out": [],
                "in": []
            }

            for countline_id, timeframes in json_counts.items():
                for timeframe in timeframes:
                    vehicle_classes = set(timeframe["anti_clockwise"].keys())

                    for vehicle_class in vehicle_classes:
                        df_request["countline_id"].append(countline_id)
                        df_request["timefrom"].append(timeframe["from"])
                        df_request["timeto"].append(timeframe["to"])
                        df_request["classes"].append(vehicle_class)
                        df_request["out"].append(timeframe["anti_clockwise"].get(vehicle_class, 0))
                        df_request["in"].append(timeframe["clockwise"].get(vehicle_class, 0))

            if any(len(v) > 0 for v in df_request.values()):
                df_request = pd.DataFrame.from_dict(df_request)
                data.append(df_request)
            else:
                print(f"Warning: Empty data for {date[0]} to {date[1]}")
        else:
            print(f"Data missing for {date[0].split('T')[0]} to {date[1].split('T')[0]}")

    except requests.exceptions.RequestException as e:
        print(f"Error during request: {e}")
        print(f"Failed request parameters: {current_params}")
        continue

# Combine all DataFrames
if data:
    final_df = pd.concat(data, ignore_index=True)

    # Convert datetime columns
    for col in ['timefrom', 'timeto']:
        final_df[col] = pd.to_datetime(final_df[col])

    # Log completion
    elapsed_time = time.time() - start_time
    print(f"\nData collection completed in {elapsed_time:.2f} seconds")
    print(f"Total records collected: {len(final_df)}")

else:
    print("No data was collected successfully")

## Stage 3: Data Processing
Now we process the raw data output and put it into a nice format. Here you can select a desired time bucket.


In [None]:
#@title  {vertical-output: true, display-mode: "form" }
#@markdown Select time bucket and run data processing
time_bucket = "1h" #@param ["15m", "1h", "24h"]

# Convert UTC to local time (using timezone from variable)
final_df['Local Datetime'] = pd.to_datetime(final_df['timefrom']).dt.tz_convert(timezone.value)

# Resample based on selected time bucket
if time_bucket == "15m":
    groupby_cols = ['timefrom', 'Local Datetime', 'countline_id']
elif time_bucket == "1h":
    # Round to nearest hour
    final_df['timefrom'] = pd.to_datetime(final_df['timefrom']).dt.floor('H',ambiguous=True)
    final_df['Local Datetime'] = final_df['Local Datetime'].dt.floor('H',ambiguous=True)
    groupby_cols = ['timefrom', 'Local Datetime', 'countline_id']
else:  # 24h
    # Round to nearest day
    final_df['timefrom'] = pd.to_datetime(final_df['timefrom']).dt.floor('D')
    final_df['Local Datetime'] = final_df['Local Datetime'].dt.floor('D')
    groupby_cols = ['timefrom', 'Local Datetime', 'countline_id']

# Create pivot table for each direction
out_pivot = final_df.pivot_table(
    index=groupby_cols,
    columns='classes',
    values='out',
    aggfunc='sum',
    fill_value=0
).reset_index()

in_pivot = final_df.pivot_table(
    index=groupby_cols,
    columns='classes',
    values='in',
    aggfunc='sum',
    fill_value=0
).reset_index()

# Add direction column
out_pivot['direction'] = 'out'
in_pivot['direction'] = 'in'

# Combine the directions
export = pd.concat([out_pivot, in_pivot], ignore_index=True)

# Rename columns
export = export.rename(columns={
    'timefrom': 'UTC Datetime',
    'countline_id': 'countlineId'
})

# Add countlineName column (you'll need to fill this with actual names if available)
export['countlineName'] = export['countlineId']

# Sort columns in desired order
first_cols = ['UTC Datetime', 'Local Datetime', 'countlineId', 'countlineName', 'direction']
class_cols = [col for col in export.columns if col not in first_cols]
export = export[first_cols + sorted(class_cols)]

# Sort by datetime and countlineId
export = export.sort_values(['UTC Datetime', 'countlineId', 'direction']).reset_index(drop=True)

# Preview the first few rows
print(f"Preview of export DataFrame (time bucket: {time_bucket}):")
print(export.head(100))
print("\nShape:", export.shape)

## Stage 4: Data Export
Now let's write this to a .csv file. You can either save the file locally (it will show in your Downloads folder) or save it to a Google Drive.

- **Local Downloads Folder:** This might not work if your browser or computer blocking downloads.

- **Google Drive:** If you want to save it in Google Drive, you will be asked for permission to connect to your Google Account.




In [None]:
#@title  { vertical-output: true, display-mode: "form" }
#@markdown Select where to save the csv file
download_location = "Local folder" #@param [ "Local folder", "Google Drive"]
#@markdown Name your file
filename = "" #@param {type:"string"}
#@markdown Hit run (>)
if download_location == "Local folder":
  from google.colab import files
  export.to_csv(filename + ".csv", index = False)
  files.download(filename + ".csv")
else:
  from google.colab import drive
  drive.mount('/content/drive')
  path = '/content/drive/My Drive/'
  export.to_csv(path + filename +".csv", index = False)