# Introduction

On several locations in the Brussels Region, traffic is measured using magnetic loops or cameras. To access this data, we can use the API or the geowebservices. For the moment, only real-time data is available, historical data is coming.

In this project, we're going to get the latest livestream data (the last 1-min interval available) for all detectors by traverse or by lane each time the code is run. We're going to extract, among other things, the number of vehicules passed between start and end time as well as the average speed of those counted vehicules.

For more information on the API, please visit the [Brussels open datastore documentation](https://data-mobility.brussels/traffic/api/counts/).

As a reminder, our ultimate goal is to display traffic data as well as other mobility data on a dynamic map of Brussels.
You can check the latest version of our map on [our Tableau Public link](https://public.tableau.com/profile/remy2092#!/vizhome/TrafficinBrussels/TrafficinBrussels). 

# Traffic counts api

Here are the 2 types of HTTP GET requests we can perform with the API:

- `devices`: List with name and location of the traverses and their detectors by lane.
- `live`: The latest livestream data for all detectors, by traverse or by lane. The data is updated every minute.

## Devices request (traverses and their detectors)

We extract the data and create an json object to store it.

In [1]:
# We use the 'traverse_' prefix to describe the devices.

import requests
import json

traverse_devices_response = requests.get("http://data-mobility.brussels/traffic/api/counts/?request=devices")
traverse_devices_status_code = traverse_devices_response.status_code
traverse_devices_content = traverse_devices_response.content
decoded_traverse_devices_content = traverse_devices_content.decode('utf-8') # Decode using the utf-8 encoding
json_traverse_devices_content = json.loads(decoded_traverse_devices_content)
json_traverse_devices_content

{'requestDate': '2019/10/23 12:37:11',
 'type': 'FeatureCollection',
 'totalFeatures': 66,
 'features': [{'type': 'Feature',
   'id': 'traverse.16838',
   'geometry': {'type': 'Point',
    'coordinates': [4.35695853200681, 50.8365913344471],
    'geometry_name': 'geom'},
   'properties': {'traverse_name': 'LOU_TD1',
    'descr_nl': 'Louizatunnel - inrit : Basiliek > Zuid + Kameren',
    'descr_fr': 'Tunnel Louise - entrée : Basilique > Midi + Cambre',
    'descr_en': 'Tunnel Louise - entrée : Basilique > Midi + Cambre',
    'orientation': 50,
    'number_of_lanes': 2,
    'detectors': ['LOU_TD1_1', 'LOU_TD1_2']}},
  {'type': 'Feature',
   'id': 'traverse.16839',
   'geometry': {'type': 'Point',
    'coordinates': [4.36852282769307, 50.8457551811458],
    'geometry_name': 'geom'},
   'properties': {'traverse_name': 'ARL_103',
    'descr_nl': 'Kunst-Wettunnel > Zuid',
    'descr_fr': 'Tunnel Arts-Loi > Midi',
    'descr_en': 'Tunnel Arts-Loi > Midi',
    'orientation': 20,
    'number_of

We are interested by the `features` key where all the attributes of each traverse is stored. 
We are first going to create an empty DataFrame to allow us to store all this information.

In [2]:
import pandas as pd

traverse_devices_df = pd.DataFrame(columns = ["traverse_request_date", "traverse_id", "traverse_name", "traverse_descr_nl", 
                                              "traverse_descr_fr", "traverse_descr_en", "traverse_longitude", 
                                              "traverse_latitude", "traverse_orientation", "traverse_number_of_lanes", 
                                              "detector_1", "detector_2", "detector_3", "detector_4", "detector_5"])

We are extracting the content of the json object to fill our DataFrame.

In [3]:
traverse_request_date = json_traverse_devices_content["requestDate"]

i = 0

for item in json_traverse_devices_content['features']:
    traverse_id = item["id"]
    traverse_longitude = item["geometry"]["coordinates"][0]
    traverse_latitude = item["geometry"]["coordinates"][1]
    traverse_name = item["properties"]["traverse_name"]
    traverse_descr_nl = item["properties"]["descr_nl"]
    traverse_descr_fr = item["properties"]["descr_fr"]
    traverse_descr_en = item["properties"]["descr_en"]
    traverse_orientation = item["properties"]["orientation"]
    traverse_number_of_lanes = item["properties"]["number_of_lanes"]
    
    detector_dict = dict.fromkeys(["detector_1", "detector_2", "detector_3", "detector_4", "detector_5"])
    detector_list = ["detector_1", "detector_2", "detector_3", "detector_4", "detector_5"]
    det_count = 0
    for detector in item["properties"]["detectors"]:
        detector_dict[detector_list[det_count]] = detector
        det_count += 1
    traverse_devices_df.loc[i] = [traverse_request_date, traverse_id, traverse_name, traverse_descr_nl, traverse_descr_fr, 
                                  traverse_descr_en, traverse_longitude, traverse_latitude, traverse_orientation, 
                                  traverse_number_of_lanes, detector_dict["detector_1"], detector_dict["detector_2"], 
                                  detector_dict["detector_3"], detector_dict["detector_4"], detector_dict["detector_5"]]  
    i += 1

In order for the `traverse_longitude`and `traverse_latitude` columns to be considered as geographical data by Tableau, we need to convert them as string.

In [4]:
coordinates = ["traverse_longitude", "traverse_latitude"]

for coord in coordinates:
    traverse_devices_df[coord] = traverse_devices_df[coord].astype(str).str.replace(".", ",")

traverse_devices_df[["traverse_longitude", "traverse_latitude"]]

Unnamed: 0,traverse_longitude,traverse_latitude
0,435695853200681,508365913344471
1,436852282769307,508457551811458
2,434920696698087,508579974792338
3,438682173223087,508418347237431
4,435756294930224,508930000247203
5,439448722891204,508402207235509
6,439448383899941,508401254712678
7,439855753338851,50848036684476
8,438057760894757,507990396839219
9,438083487018981,508402353941657


We drop `traverse_descr_en` as it contains the same information as `traverse_descr_fr`.

In [5]:
traverse_devices_df.drop("traverse_descr_en", axis=1, inplace=True)

In [6]:
traverse_devices_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 66 entries, 0 to 65
Data columns (total 14 columns):
traverse_request_date       66 non-null object
traverse_id                 66 non-null object
traverse_name               66 non-null object
traverse_descr_nl           66 non-null object
traverse_descr_fr           66 non-null object
traverse_longitude          66 non-null object
traverse_latitude           66 non-null object
traverse_orientation        66 non-null object
traverse_number_of_lanes    66 non-null object
detector_1                  66 non-null object
detector_2                  54 non-null object
detector_3                  7 non-null object
detector_4                  3 non-null object
detector_5                  1 non-null object
dtypes: object(14)
memory usage: 7.7+ KB


We only overwrite the existing file if detectors are added.

In [7]:
# TO DO: uncomment the below line and define your path on where to save your file.
# my_path = ""

import os

if not os.path.isfile(my_path):
    traverse_devices_df.to_csv(my_path, sep=";")
else:
    old_traverse_devices_df = pd.read_csv(my_path, delimiter=";")
    old_traverse_devices_df.shape[0]
    if traverse_devices_df.shape[0] != old_traverse_devices_df.shape[0]:
        traverse_devices_df.to_csv(my_path, sep=";")

We create the list of traverse names as we'll need it to extract live data.

In [8]:
list_of_traverse_name = []
for item in json_traverse_devices_content["features"]:
    traverse_name = item["properties"]["traverse_name"]
    list_of_traverse_name.append(traverse_name)

list_of_traverse_name

['LOU_TD1',
 'ARL_103',
 'SB020_BBin',
 'LOI_103',
 'SB0236_BHout',
 'CIN_TD1',
 'CIN_TD2',
 'RCE_TD1',
 'SUL62_BHin',
 'BE_TD1',
 'SB121_BBin',
 'ARL_203',
 'HAL_292',
 'TRO_203',
 'SB125_BBout',
 'BOT_TD2',
 'HAL_191',
 'DEL_103_12',
 'MON_TD1',
 'LOU_110',
 'STE_TD3',
 'SUL62_BGin',
 'MAD_103',
 'VLE_103',
 'ROG_TD1',
 'PNA_203',
 'DEL_103_6',
 'SUL62_BDout',
 'LOU_TD2',
 'VP_103',
 'STE_TD2',
 'BOI_203',
 'GH_103',
 'ROG_TD2',
 'SGN02_BBout',
 'TER_TD1',
 'BEL_TD4',
 'RME_TD1',
 'SB1201_BAout',
 'STE_TD1',
 'SB0246_BAout',
 'TRO_TD2',
 'GH_203',
 'SGN02_BAout',
 'SB020_BCin',
 'PNA_103',
 'SUL62_BHout',
 'SUL62_BGout',
 'MAD_203',
 'VLE_203',
 'BET_TD3',
 'VP_203',
 'SUL62_BA1out',
 'SUL62_BDin',
 'SB0246_BXout',
 'SB020_BDout',
 'BAI_TD2',
 'BAI_TD1',
 'LOI_109',
 'SB020_BAout',
 'MON_TD2',
 'SB0236_BCout',
 'TRO_TD1',
 'BET_TD2_12',
 'TER_TD2',
 'BEL_TD5']

## Live request

Here are the parameters of the live request:

- featureID: Optional parameter for :w live requests. The value should be a traverse name or lane detector.
- interval: Optional parameter for live requests. The parameters makes a filter for the measurement time space. Possible values are '1', '5', '15', '60' and 'all'. The value defines the time space in minutes. The default value 'all', gives you all data.
- includeLanes: Optional parameter for live requests. If the parameter is set 'true', the response gives also the data by lane. The default value is 'false'.
- singleValue: Optional parameter for live requests. If the parameter is set 'true', the response gives only the last timestamp value. The default value is 'false'.

Below we extract the real-time data off all the traverse and create an json object to store it. 

We will only focus on the last minute (`interval`: 1) and we'll take the last timestamp (`singleValue`: true).

In [9]:
parameters = {'request': 'live', 'interval': '1', 'singleValue': 'true'}
traverse_live_response = requests.get("http://data-mobility.brussels/traffic/api/counts/", params=parameters)
traverse_live_status_code = traverse_live_response.status_code
traverse_live_content = traverse_live_response.content
decoded_traverse_live_content = traverse_live_content.decode('utf-8') # Decode using the utf-8 encoding
json_traverse_live_content = json.loads(decoded_traverse_live_content)
json_traverse_live_content

{'requestDate': '2019/10/23 12:37:17',
 'data': {'ARL_103': {'results': {'1m': {'count': 41,
     'speed': 24.0,
     'occupancy': 50.0,
     'start_time': '2019/10/23 12:35',
     'end_time': '2019/10/23 12:36'}}},
  'ARL_203': {'results': {'1m': {'count': 19,
     'speed': 16.5,
     'occupancy': 36.5,
     'start_time': '2019/10/23 12:35',
     'end_time': '2019/10/23 12:36'}}},
  'BAI_TD1': {'results': {'1m': {'count': 18,
     'speed': 46.0,
     'occupancy': 12.5,
     'start_time': '2019/10/23 12:35',
     'end_time': '2019/10/23 12:36'}}},
  'BAI_TD2': {'results': {'1m': {'count': 28,
     'speed': 48.0,
     'occupancy': 20.0,
     'start_time': '2019/10/23 12:35',
     'end_time': '2019/10/23 12:36'}}},
  'BEL_TD4': {'results': {'1m': {'count': 16,
     'speed': 76.0,
     'occupancy': 6.0,
     'start_time': '2019/10/23 12:35',
     'end_time': '2019/10/23 12:36'}}},
  'BEL_TD5': {'results': {'1m': {'count': 37,
     'speed': 45.5,
     'occupancy': 22.0,
     'start_time': 

We are interested by the `data` key where all the real-time data of each traverse is stored. 
We are first going to create an empty DataFrame to allow us to store all this information.

In [10]:
traverse_live_df = pd.DataFrame(columns = ['traverse_live_request_date', 'traverse_name', 'traverse_interval', 'traverse_count', 
                                           'traverse_speed', 'traverse_occupancy','traverse_start_time', 'traverse_end_time'])

We are extracting the content of the json object to fill our DataFrame.

In [11]:
traverse_live_request_date = json_traverse_live_content["requestDate"]
traverse_interval = '1m'
i = 0

for col in list_of_traverse_name:
    traverse_name = col
    traverse_count = json_traverse_live_content["data"][col]["results"][traverse_interval]["count"]
    traverse_speed = json_traverse_live_content["data"][col]["results"][traverse_interval]["speed"]
    traverse_occupancy = json_traverse_live_content["data"][col]["results"][traverse_interval]["occupancy"]
    traverse_start_time = json_traverse_live_content["data"][col]["results"][traverse_interval]["start_time"]
    traverse_end_time = json_traverse_live_content["data"][col]["results"][traverse_interval]["end_time"]
    
    traverse_live_df.loc[i] = [traverse_live_request_date, traverse_name, traverse_interval, traverse_count, traverse_speed, 
                               traverse_occupancy, traverse_start_time, traverse_end_time]
    i += 1

We start by dropping rows with null values as it won't be used in our visualizations.

In [12]:
traverse_live_df.dropna(inplace = True)

In order for the `traverse_speed`and `traverse_occupancy` columns to be considered as float numbers by Tableau, we need to convert them as string and replace "." by ",".

In [13]:
measures = ["traverse_speed", "traverse_occupancy"]

for measure in measures:
    traverse_live_df.loc[traverse_live_df[measure] < 0, measure] = 0
    traverse_live_df[measure] = traverse_live_df[measure].astype(str).str.replace(".", ",")

traverse_live_df[["traverse_speed", "traverse_occupancy"]]

Unnamed: 0,traverse_speed,traverse_occupancy
0,505,265
1,240,500
2,305,90
3,580,125
5,540,285
6,655,140
7,440,230
9,603333,253333
11,165,365
12,280,65


In order for the `traverse_request_date`, `traverse_start_time` and `traverse_end_time` columns to be considered as dates by Tableau, we need apply some formatting.

In [14]:
traverse_live_df["traverse_live_request_date"] = pd.to_datetime(traverse_live_df["traverse_live_request_date"], 
                                                                format='%Y/%m/%d %H:%M')
traverse_live_df["traverse_start_time"] = pd.to_datetime(traverse_live_df["traverse_start_time"], format='%Y/%m/%d %H:%M', 
                                                         errors = 'coerce')
traverse_live_df["traverse_end_time"] = pd.to_datetime(traverse_live_df["traverse_end_time"], format='%Y/%m/%d %H:%M', 
                                                       errors = 'coerce')
traverse_live_df["traverse_end_date"] = traverse_live_df["traverse_end_time"].dt.date
traverse_live_df["traverse_end_hour"] = traverse_live_df["traverse_end_time"].dt.time

In [15]:
traverse_live_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 54 entries, 0 to 65
Data columns (total 10 columns):
traverse_live_request_date    54 non-null datetime64[ns]
traverse_name                 54 non-null object
traverse_interval             54 non-null object
traverse_count                54 non-null object
traverse_speed                54 non-null object
traverse_occupancy            54 non-null object
traverse_start_time           54 non-null datetime64[ns]
traverse_end_time             54 non-null datetime64[ns]
traverse_end_date             54 non-null object
traverse_end_hour             54 non-null object
dtypes: datetime64[ns](3), object(7)
memory usage: 4.6+ KB


We append new rows to the existing .csv file.

In [16]:
# TO DO: uncomment the below line and define your path on where to save your file.
# my_path = ""

import datetime

if not os.path.isfile(my_path):
    traverse_live_df.to_csv(my_path, sep=";")
else:
    old_traverse_live_df = pd.read_csv(my_path, delimiter=";")
    
    for index, row in traverse_live_df.iterrows():
        traverse_live_last_update = row["traverse_end_time"]
        traverse_name = row["traverse_name"]
        old_traverse_live_last_update = old_traverse_live_df.loc[old_traverse_live_df["traverse_name"] == traverse_name, 
                                                                 "traverse_end_time"].max()
        old_traverse_live_last_update = datetime.datetime.strptime(old_traverse_live_last_update, '%Y-%m-%d %H:%M:%S')

        if traverse_live_last_update <= old_traverse_live_last_update:
            traverse_live_df.drop(index, axis=0, inplace=True)
    
    traverse_live_df.to_csv(my_path, sep=";", mode='a', header=False)