## Requirements for this notebook.
to run this notebook you need to install the following packages:
- requests
- aiohttp

### Asyncronous programming
Asyncronous programming is a way of writing code that allows you to run multiple tasks at the same time. This is useful when you have a task that takes a long time to complete, and you want to do other things while you wait for it to finish. Foe example, if you are downloading a file from the internet, you can do other things while you wait for the download to finish, like request another file from the internet. Modern web brousers use asyncronous programming to download multiple elements at the same time, so that the page loads faster. (by default they download 6 elements at the same time)

Python has full support for asyncronous programming. In this notebook we will use the `requests` library to make HTTP requests, and the `aiohttp` library to make asyncronous HTTP requests.

The trick about asyncronous programming is that you have to write your code in a special way. You have to use the `async` and `await` keywords. The `async` keyword is used to define a function that is asyncronous. The `await` keyword is used to wait for an asyncronous function to finish.

An asyncronous function is a function that returns a `coroutine` object. A `coroutine` object is a special object that represents a task that is running (or will start running) in the background, in an `event loop`. 
Asyncronous programming might sound complicated, but it is actually very simple, as long you follow a few rules.
 - You define your asyncronous functions with the `async` keyword.
 - You can only use the `await` keyword inside an asyncronous function.
 - You need to create your tasks before hand.
 - You need to execute your tasks in an `event loop`.

Thankfully recent versions of python have made asyncronous programming very easy compared to older versions.

> Note: A more in depth explanation of asyncronous programming with example can be found in the [trio turotial](https://trio.readthedocs.io/en/stable/tutorial.html).

Here is an example of an asyncronous function:



In [4]:
import asyncio
import time
tasks = set()
start_time = time.time()

async def sleep(task_id:str,sleep_time:int):
    print('%s going to sleep!' % task_id)
    start_time = time.time()
    await asyncio.sleep(sleep_time)
    print('%s woke up after %.4f seconds' % (task_id,time.time() - start_time))
    
start_time = time.time()
# add tasks to the set
for task_id,sleep_time in (('task1',3), ('task2',2), ('task3',1)):
    tasks.add(asyncio.ensure_future(sleep(task_id,sleep_time)))

# collect results as they come back
for result in asyncio.as_completed(tasks):
    await result

print('all done in %.4f seconds' % (time.time() - start_time) )

task1 going to sleep!
task2 going to sleep!
task3 going to sleep!
task3 woke up after 1.0012 seconds
task2 woke up after 2.0014 seconds
task1 woke up after 3.0018 seconds
all done in 3.0023 seconds


In [3]:
# an in sychronous version
import asyncio
import time

start_time = time.time()

def sleep(task_id:str,sleep_time:int):
    print('%s going to sleep!' % task_id)
    start_time = time.time()
    time.sleep(sleep_time)
    print('%s woke up after %.4f seconds' % (task_id,time.time() - start_time))

for task_id,sleep_time in (('task1',3), ('task2',2), ('task3',1)):
    sleep(task_id,sleep_time)

print('all done in %.4f seconds' % (time.time() - start_time) )

task1 going to sleep!
task1 woke up after 3.0031 seconds
task2 going to sleep!
task2 woke up after 2.0020 seconds
task3 going to sleep!
task3 woke up after 1.0011 seconds
all done in 6.0068 seconds


To apply the same conceprt we want to do asyncronous HTTP requests. Unfortunally, the `requests` library is `not` build to support that architecture, but the `aiohttp` library is. The `aiohttp` library is a library that allows you to make asyncronous HTTP requests, and is very similar API (functions and architecture) to the `requests` library. 

In [13]:
# if the script bellow, a starts to act funny, 
# try to decrease the number of concurrent requests by lowering the semaphore value

import time
from datetime import datetime
import asyncio
import aiohttp
import pandas as pd
from itertools import product

URL_BASE = "https://glasgow-cctv.ubdc.ac.uk/api/"
start_time = time.time()
tasks = set()

# the semaphore limits the number of concurrent requests
# evertime a request is scheduled, the semaphore count is increased, until it reaches 
# the limit. From there, requesting another increase, will  block  the flow until the count goes less than the max.
sem = asyncio.Semaphore(50) 

# which cameras are we interested in?
cameras = [54, 40, 68]


async def get_records(session, page: int = 1, camera_id: int = None):
    """ Get records from the API. """
    async with sem:
        start_time = time.time()
        print(f"Getting (page-camera) {page}-{camera_id} at time {datetime.fromtimestamp(time.time()).isoformat()}")
        async with session.get(URL_BASE + "records/", params={"page": page, 'camera_id':camera_id}) as resp:
            data = await resp.json()
            print(f"Got (page-camera) {page}-{camera_id} at time {datetime.fromtimestamp(time.time()).isoformat()}. Elapsed time: {time.time() - start_time}")
            return data


df: pd.DataFrame | None= None

async with aiohttp.ClientSession() as session:
    for page, camera_id in product(range(1, 100), cameras):
        tasks.add(asyncio.ensure_future(get_records(session, page, camera_id)))

    for result in asyncio.as_completed(tasks):
        data = await result
        if df is None:
            df = pd.DataFrame.from_records(data["results"], index="id")
        else:
            df = pd.concat([df, pd.DataFrame.from_records(data["results"], index="id")])
print(time.time() - start_time)

Getting (page-camera) 1-54 at time 2023-02-25T23:04:34.218848
Getting (page-camera) 1-40 at time 2023-02-25T23:04:34.219374
Getting (page-camera) 1-68 at time 2023-02-25T23:04:34.219661
Getting (page-camera) 2-54 at time 2023-02-25T23:04:34.219989
Getting (page-camera) 2-40 at time 2023-02-25T23:04:34.220248
Getting (page-camera) 2-68 at time 2023-02-25T23:04:34.220816
Getting (page-camera) 3-54 at time 2023-02-25T23:04:34.221185
Getting (page-camera) 3-40 at time 2023-02-25T23:04:34.221730
Getting (page-camera) 3-68 at time 2023-02-25T23:04:34.222119
Getting (page-camera) 4-54 at time 2023-02-25T23:04:34.222521
Getting (page-camera) 4-40 at time 2023-02-25T23:04:34.222899
Getting (page-camera) 4-68 at time 2023-02-25T23:04:34.223259
Getting (page-camera) 5-54 at time 2023-02-25T23:04:34.223636
Getting (page-camera) 5-40 at time 2023-02-25T23:04:34.224498
Getting (page-camera) 5-68 at time 2023-02-25T23:04:34.224731
Getting (page-camera) 6-54 at time 2023-02-25T23:04:34.225051
Getting 

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4455 entries, 1348590 to 1478380
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   camera       4455 non-null   int64 
 1   timestamp    4455 non-null   object
 2   cars         4455 non-null   int64 
 3   persons      4455 non-null   int64 
 4   bicycles     4455 non-null   int64 
 5   trucks       4455 non-null   int64 
 6   motorcycles  4455 non-null   int64 
 7   buses        4455 non-null   int64 
 8   camera_ok    0 non-null      object
dtypes: int64(7), object(2)
memory usage: 348.0+ KB


In [106]:
df.describe()

Unnamed: 0,camera,cars,persons,bicycles,trucks,motorcycles,buses
count,1485.0,1485.0,1485.0,1485.0,1485.0,1485.0,1485.0
mean,3.0,2.327273,6.807407,0.052525,0.07138,0.020202,0.277441
std,0.0,1.848534,5.480234,0.248856,0.275252,0.140738,0.607508
min,3.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,1.0,2.0,0.0,0.0,0.0,0.0
50%,3.0,2.0,6.0,0.0,0.0,0.0,0.0
75%,3.0,3.0,11.0,0.0,0.0,0.0,0.0
max,3.0,11.0,25.0,2.0,2.0,1.0,4.0
