<img src="https://fsdl.me/logo-720-dark-horizontal">

# Load-Testing with Locust

This notebook shows you how to load-test a Text Recognizer endpoint deployed via AWS Lambda.

Load-testing is a form of testing that checks
whether a web service can handle a particular
_load_ of web traffic --
quantity, timing, and content.

We'll use an open-source Python framework called [Locust](https://docs.locust.io/en/stable/what-is-locust.html) for this purpose. 

From [their documentation](https://docs.locust.io/):

> Locust is an easy to use, scriptable and scalable performance testing tool.

For more on using Locust, check out [this video](https://www.youtube.com/watch?t=163&v=Ok4x2LIbEEY&feature=emb_imp_woyt). 

### Setup

Locust is available as a Python package,
so it's easy to install into our development environment.

In [None]:
!pip install -q locust

# Writing a load test with `locust`

We'll use Locust to simulate our users --
creating a swarm of simple agents,
like locusts.

We define each user type as a class
inside a `.py` file:

In [None]:
load_test_file = "locust_http_user.py"

In [None]:
%%writefile {load_test_file}

from locust import HttpUser, constant, task
import json
import requests

IMAGE_URI = "https://fsdl-public-assets.s3-us-west-2.amazonaws.com/paragraphs/a01-077.png"


class TextRecognizerUser(HttpUser):
    wait_time = constant(1)
    headers = {"Content-type": "application/json"}
    payload = json.dumps({"image_url": IMAGE_URI})

    @task
    def predict(self):
        response = self.client.post("/", data=self.payload, headers=self.headers)
        pred = response.json()["pred"]

We have only one type of user,
the `TextRecognizerUser`,
which inherits from the `HttpUser` class.

From the [documentation](https://docs.locust.io/en/stable/writing-a-locustfile.html#writing-a-locustfile):

> `HttpUser` ... gives each user a `client` attribute, which is an instance of `HttpSession`, that can be used to make HTTP requests to the target system that we want to load test. When a test starts, `locust` will create an instance of this class for every user that it simulates ...

Inside `TextRecognizerUser`, we define the task (decorated with `@task`) that we want to load test. In our case, this corresponds to calling our AWS Lambda endpoint and retrieving the predictions.

We have also defined a `wait_time` inside `TextRecognizerUser` that simulates a delay (in seconds) in between the requests. We can also provide sophisticated delay configurations here.

For more information, consult [this resource](https://docs.locust.io/en/stable/writing-a-locustfile.html#wait-time). 

# Running the load test

We run our load test with the `locust` command line tool.

Run this cell, which takes about two minutes to complete,
and continue reading below.

In [None]:
!locust --locustfile=locust_http_user.py \
  --headless \
  --users=10 \
  --spawn-rate=1 \
  --run-time=2m \
  --host=https://3akxma777p53w57mmdika3sflu0fvazm.lambda-url.us-west-1.on.aws \
  --html=locust_report.html \
  --csv=locust_report

Below are brief descriptions of each of the arguments we use above:

* `locustfile`: The Python file that will be used for conducting the load test.
* `headless`: When set to `true` Locust will run the load testing in background and will not use the web UI. When set to `false` Locust will start a web server on the post 8089. 
* `users`: Peak number of concurrent simulated users.
* `spawn-rate`: Number of users to spawn per second. 
* `run-time`: Total duration for running the load test. After the time has elapsed, Locust will stop. 
* `host`: The target for load testing. 
* `html`: Path to an HTML file that will contain aggregate information after the load test is complete.
* `csv`: Prefix of CSV files to store current request stats.

Locust provides many more options.

It's generally better to store this information ina configuration file,
so that results are more easily reproducible.

You can read about configuring Locust [here](https://docs.locust.io/en/stable/configuration.html).


# Viewing the results

After the load test is complete,
a number of report files will be generated:

In [None]:
!ls -lh locust_report*

We can view the HTML report directly in the notebook
(or open it in the browser).

In [None]:
import IPython.display

IPython.display.HTML("locust_report.html")

First, we see some summary statistics on the user requests and our endpoint's responses.

Here's what we saw:

![](https://i.ibb.co/997KF6R/aggregate-stats.png)

In the case above, the endpoint was load-tested with a total of 554 requests.

This view provides other useful aggregate information like:

* Requests per second (`RPS`)
* Average latency (`Average (ms)`)

In a majority of real production scenarios, you want to minimize the latency
and maximize the RPS.

We also want to watch the maximum response time,
which can be a leading indicator of issues of our system
or reveal issues that only a small minority of users encounter.

Scrolling down, we find charts that show request and response data over time, like so:

![](https://i.ibb.co/MZfy6VR/charts.png)

These charts help us dig into failures and performance issues by showing them in greater detail.

In our case, there were no request failures. However, request failures can happen when an endpoint is met with heavy traffic.

We can also see the median and 95th percentile response times changing as requests come in.

This is due to the auto-scaling behavior of AWS Lambdas
(and other serverless cloud function systems).

When the first few requests come in,
the response times are very high.

That's because in the absence of traffic,
our system scales down to 0 --
no machines are running.

Getting started again when traffic arrives
requires a bit of setup work for each machine running our Lambda.

This setup work is visible in our data as a high median response time
at the start that falls quickly
(once half or more of the machines have been spun up)
and an equally high 95th percentile response time
that falls more slowly
(once at least 95% of the machines have been spun up).

# Analyzing load test data programmatically

Charts are nice for quick discovery,
but they're hard to incorporate into automated workflows.

We can also programmatically handle and analyze the data `locust` collected for us.

We just need to read in the generated CSV files as `panbdas.DataFrames`.

For the purpose of this notebook, we'll only use `locust_report_stats_history.csv`,
and we'll just generate some charts to show that the data is present.

Some code in the below sections have been adapted from
[this notebook](https://github.com/sayakpaul/deploy-hf-tf-vision-models/blob/main/locust/load_test_results_vit_gke.ipynb),
on testing a Vision Transformer deployed on Google Kubernetes Engine.

In [None]:
import pandas as pd 


csv_path = "locust_report_stats_history.csv"
results = pd.read_csv(csv_path)
results["Timestamp"] = pd.to_datetime(results["Timestamp"], unit="s")
results.tail()

Now, we have total control over the plotted metrics:

In [None]:
request_columns = ["Total Request Count", "Total Failure Count", "User Count"]
results.plot(x="Timestamp", y=request_columns, subplots=True, sharey=True);

In [None]:
response_columns = ["Total Average Response Time", "Total Max Response Time"]
results.plot(x="Timestamp", y=response_columns);

And we have total control over the calculation of statistics:

In [None]:
results.groupby("Total Median Response Time").describe()

Working with the raw data yourself
is often helpful when you have multiple CSV files
(from multiple load tests of different configurations)
to compare. 

# Testing and analyzing large-scale loads

In this notebook,
we learned how to conduct a simple load test for our text recognizer endpoint.

When developing endpoints for large-scale production use,
you'd want to generate traffic that is more realistic
(e.g. comes from multiple hosts, not just one)
and higher volume.

This is where distributed load testing will come handy.

You can consult [this resource](https://docs.locust.io/en/stable/running-cloud-integration.html) in case you want to know more.