<img src="https://fsdl.me/logo-720-dark-horizontal">

This notebook shows you how to load-test the text recognizer endpoint you deployed via AWS Lambda. We'll use an open-source Python framework called Locust for this purpose. 

From the [documentation of of Locust](https://docs.locust.io/en/stable/what-is-locust.html):

> Locust is an easy to use, scriptable and scalable performance testing tool.

If you aren't familiar with Locust, be sure to check [this video](https://www.youtube.com/watch?t=163&v=Ok4x2LIbEEY&feature=emb_imp_woyt) out. 

## Setup

We just have a single Python dependency to install if you're running this on Colab. 

In [None]:
!pip install -q locust

## Define Variables

Here, we define paths for the

* Locust script that will be used for conducting the load test. 
* Configuration file that will be used by the above script. 

In [None]:
load_test_http_user = "locust_http_user.py"
locust_conf = "locust_fast_configs.conf"

## Locust Script for Load Testing

In [None]:
%%writefile {load_test_http_user}

from locust import HttpUser, constant, task
import json
import requests

IMAGE_URI = "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"


class TextRecognizerUser(HttpUser):
    wait_time = constant(1)
    headers = {"Content-type": "application/json"}
    payload = json.dumps({"image_url": IMAGE_URI})

    @task
    def predict(self):
        response = self.client.post("/", data=self.payload, headers=self.headers)
        pred = response.json()["pred"]

We start with the `TextRecognizerUser` class inherited from the `HttpUser` class. From the [documentation](https://docs.locust.io/en/stable/writing-a-locustfile.html#writing-a-locustfile):

> [...] from HttpUser which gives each user a client attribute, which is an instance of HttpSession, that can be used to make HTTP requests to the target system that we want to load test. When a test starts, locust will create an instance of this class for every user that it simulates, and each of these users will start running within their own green gevent thread.

Inside `TextRecognizerUser`, we define the task (decorated with `@task`) that we want to load test. In our case, this corresponds to calling our AWS Lambda endpoint and retrieving the predictions. We can define multiple tasks (each decorated with `@task`) like this and even weigh them differently. 

We have also defined a `wait_time` inside `TextRecognizerUser` that simulates a delay (in seconds) in between the requests. We can also provide sophisticated delay configurations here. For more information, consult [this resource](https://docs.locust.io/en/stable/writing-a-locustfile.html#wait-time). 

## Define Locust Configuration

In [None]:
%%writefile {locust_conf}

locustfile = "locust_http_user.py"
headless = true
users = 10
spawn-rate = 1
run-time = 5m
host = https://3akxma777p53w57mmdika3sflu0fvazm.lambda-url.us-west-1.on.aws
html = locust_report.html
csv = locust_report

Below are brief descriptions of each of the variables we defined above:

* `locustfile`: The Locust script that will be used for conducting the load test.
* `headless`: When set to `true` Locust will run the load testing in background and will not use the web UI. When set to `false` Locust will start a web server on the post 8089. 
* `users`: Peak number of concurrent Locust users. We kept it to only 10 because we're using a public endpoint here. Heavier traffic might result in [DoS](https://en.wikipedia.org/wiki/Denial-of-service_attack).
* `spawn-rate`: Number of users to spawn per second. 
* `run-time`: Total duration for running the load test. After the time has elapsed, Locust will stop. 
* `host`: Host where that houses the text recognizer endpoint. 
* `html`: Path to an HTML file that will contain aggregate information after the load test is complete.
* `csv`: Prefix of CSV files to store current request stats.

Locust provides many more options and you can read about them [here](https://docs.locust.io/en/stable/configuration.html).


## Start the Load Test

In [None]:
%%capture
!locust --config={locust_conf}

## Results

After the load test is complete you should get the following files:

In [None]:
!ls -lh locust_report*

You can just download the `locust_report.html` file and take a look at the charts it provides. Below we see a table from the HTML file showing statistics on the request payloads and response times:

![](https://i.ibb.co/997KF6R/aggregate-stats.png)

The endpoint was load-tested with a total of 554 requests. It provides other useful information like:

* Requests per second (`RPS`)
* Average latency (`Average (ms)`)

In a majority of real production scenarios, you'd want to reduce the average latency as minimum as possible and increase the RPS as high as possible. In the HTML file, we also get charts like so:

![](https://i.ibb.co/MZfy6VR/charts.png)

These charts help us study if there's any request failures and how the response time of the endpoint has changed over time. In our case, there were no request failures. However, request failures can happen when an endpoint is met with havy traffic. 

If you want to derive the above stats programmatically, you can do so using the generated CSV files. For the purpose of this notebook, we'll only use `locust_report_stats_history.csv`. Some code in the below sections have been taken from [this notebook](https://github.com/sayakpaul/deploy-hf-tf-vision-models/blob/main/locust/load_test_results_vit_gke.ipynb). 

We start by importing `pandas` and `matplotlib` and then loading the CSV file into a Pandas Dataframe. 

In [None]:
import pandas as pd 
import matplotlib.pyplot as plt

In [None]:
csv_path = "locust_report_stats_history.csv"
report = pd.read_csv(csv_path)
report.tail()

Next, we define a few utility methods to tidy up the Dataframe a bit. 

In [None]:
to_be_updated_columns = [
    {
        "old": "Total Average Response Time",
        "new": "Total Average Response Time (ms)"
    },
    {
      "old": "Total Max Response Time",
      "new": "Total Max Response Time (ms)"
    }
]

In [None]:
def update_column_names(report, to_be_updated_columns):
    for column in to_be_updated_columns:
        report[column['new']] = report[column['old']]

    return report

def set_timestamp_offsets(report):
    report["Timestamp"] = report["Timestamp"] - report.head(1)["Timestamp"].tolist()[0]
    return report

In [None]:
report = update_column_names(report, to_be_updated_columns)
report = set_timestamp_offsets(report)

Now, we can plot the desired metrics in a comprehensive manner. 

In [None]:
def draw_graph(report, axis, title, x_axis="Timestamp"):
    y_axis = title
    axis.set_title(title)

    axis.plot(report[x_axis], report[y_axis])
    axis.set_xlabel(x_axis)
    return axis

In [None]:
figure, axis = plt.subplots(3, 2, figsize=(30,30), dpi=80)

axis[0, 0] = draw_graph(report, axis[0, 0], 'Total Request Count')
axis[0, 1] = draw_graph(report, axis[0, 1], 'User Count')
axis[1, 0] = draw_graph(report, axis[1, 0], 'Requests/s')
axis[1, 1] = draw_graph(report, axis[1, 1], 'Total Failure Count')
axis[2, 0] = draw_graph(report, axis[2, 0], 'Total Average Response Time (ms)')
axis[2, 1] = draw_graph(report, axis[2, 1], 'Total Max Response Time (ms)')

plt.show()

From the above chart, we can see that the average response time of the endpoint has reduced till a certain point and after that it has plateaued. 

This method of visualization is often helpful when you have multiple CSV files (from multiple load tests of different configurations) to compare. 

## Concluding Notes

In this notebook, we learned how to conduct a simple load test for our text recognizer endpoint. When developing endpoints for production use, you'd want to generate a more realistic traffic load. This is where distributed load testing will come handy. You can consult [this resource](https://docs.locust.io/en/stable/running-cloud-integration.html) in case you want to know more.