New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking utilities #225
Conversation
I'll explain one group of changes here since they function as a group. This relates to the changes in: Our current benchmarking approach relies on Clipper to connect to model-containers that don't explicitly register with it. For this to happen, a model-container must try to connect to Clipper on the default port and with the IP address where Clipper exists. You can see an example of this in This is what the new An identical dynamic exists between This is a sort of hacky solution to a very specific problem, so I understand if we'd want to either change the solution itself or not include one in this PR. |
bench/setup_sklearn_bench.sh
Outdated
|
||
export CLIPPER_MODEL_NAME="bench_sklearn_cifar" | ||
export CLIPPER_MODEL_VERSION="1" | ||
export CLIPPER_MODEL_PATH="model/" | ||
|
||
python ../containers/python/sklearn_cifar_container.py | ||
python containers/python/sklearn_cifar_container.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In end_to_end_bench
, we prompt the user to run this script (bench/setup_sklearn_bench.sh
) from the root directory. This change lets it work from the root directory (before, it would work only if run from within bench/
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -0,0 +1,66 @@ | |||
from __future__ import print_function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This container is even more noop-y than the noop container -- I am using it for throughput benchmarking to minimize the amount of time actually generating predictions, letting container latency better approximate rpc overhead. This isn't particularly hard to re-implement when needed; it might just be muddying the codebase to have this here. Happy to remove if you think that's best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename this to noop_container
and remove the current version of noop_container
. Afaik, we don't have any scripts that depend on the nature of the output of the current noop_container
(if we do, we shouldn't), and this seems implementation to be a more accurate no-op.
src/benchmarks/src/bench_utils.cpp
Outdated
const std::string MODEL_VERSION_PROMPT = | ||
"Enter the version of the model you want queried: "; | ||
|
||
std::string _get_prompt(std::string var) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be a much more elegant way to implement this. I noticed that switch statements don't work with strings, and creating a static map didn't seem a lot better. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it doesn't seem like there is a more elegant way to do string matching in C++. However, I think that it would be beneficial to define prompts for a specific benchmark in its own implementation file. This would reduce clutter.
Towards this end, we should wrap these utility methods in a class with a constructor that accepts a mapping of desired_vars
to prompt variables. _get_prompt
can then reference this map.
src/benchmarks/src/bench_utils.cpp
Outdated
|
||
namespace bench_utils { | ||
|
||
const std::string CIFAR_DATA_PATH_PROMPT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a handful of these (and their counterpart variables here https://github.com/ucbrise/clipper/pull/225/files#diff-5ac4406c77f36f86be1d3ed9ab6795fbR8) are used by end_to_end_bench
. The rest are used by the throughput benchmarking script that isn't included. Leaving these strings around might reduce busywork in the future, but it also could just add noise. Happy to remove if that seems right.
@@ -0,0 +1,47 @@ | |||
#ifndef CLIPPER_APP_METRICS_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been moved to its own file because it can be used to gather metrics for benchmarking (as is done here https://github.com/nishadsingh1/clipper/blob/benchmark/src/benchmarks/src/noop_bench.cpp#L192).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include this within metrics.hpp
and the metrics
namespace instead.
void send_feedback(PredictTask task); | ||
|
||
VersionedModelId model_; | ||
int container_id_; | ||
int replica_id_; | ||
InputType input_type_; | ||
clipper::metrics::Histogram latency_hist_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This histogram (which receives datapoints here: https://github.com/ucbrise/clipper/pull/225/files#diff-628067215a2c6ded52652fff0b43720bR46) is not used now, but I could see it being involved in some selection policies / task discarding in the future. Happy to remove if this isn't the right time to include it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems useful! Let's keep it. See previous comment about separating it from the update_throughput
method.
@@ -291,6 +291,7 @@ class Histogram : public Metric { | |||
void insert(const int64_t value); | |||
const HistogramStats compute_stats(); | |||
static double percentile(std::vector<int64_t> snapshot, double rank); | |||
double percentile(double rank); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just calls percentile(snapshot, rank)
with the same snapshot used in compute_stats()
. It's just included because it might be convenient -- here's an example of its usage: https://github.com/nishadsingh1/clipper/blob/benchmark/src/benchmarks/src/noop_bench.cpp#L130.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you add documentation indicating that a snapshot is obtained from the histogram's sampler and then used to calculate the percentile?
Test PASSed. |
Test PASSed. |
bench/bench_init.py
Outdated
@@ -6,7 +6,6 @@ | |||
sys.path.append(os.path.abspath("%s/../examples" % cur_dir)) | |||
# sys.path.insert(0, os.path.abspath('%s/../containers/python/' % cur_dir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - while we're cleaning up imports, can this commented import be removed as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
bench/README.md
Outdated
|
||
## Optional Configuration Files | ||
The following benchmark attributes can be loaded via a JSON configuration file: | ||
- **cifar_data_path**: The path to a **specific binary data file** within the CIFAR10 binary dataset with a name of the form `data_batch_<n>.bin`. (`data_batch_1.bin`, for example) | ||
- **cifar_data_path**: The path to a **specific binary data file** within the CIFAR10 binary dataset with a name of the form `data_batch_<n>.bin`. (`/Users/.../cifar-100-binary/data_batch_1.bin`, for example) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the path here more generic. Maybe <path_to_unzipped_cifar_directory>/data_batch_1.bin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
bench/setup_noop_bench_docker.sh
Outdated
export CLIPPER_MODEL_PATH="model/" | ||
|
||
# Sets CLIPPER_IP to AWS instance's IP | ||
export CLIPPER_IP=$(curl http://169.254.169.254/latest/meta-data/local-ipv4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the documentation, it seems like 169.254.169.254
is a local URI that, if queried within instance X
, provides metadata about instance X
. Therefore, this should only work if the docker container corresponding to noop_container
is running on the same host as the query_frontend
container, right? If so, we should make note of this limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's exactly right -- thanks for looking into it; I agree we should clarify that.
bench/setup_sklearn_bench.sh
Outdated
|
||
export CLIPPER_MODEL_NAME="bench_sklearn_cifar" | ||
export CLIPPER_MODEL_VERSION="1" | ||
export CLIPPER_MODEL_PATH="model/" | ||
|
||
python ../containers/python/sklearn_cifar_container.py | ||
python containers/python/sklearn_cifar_container.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
bin/build_bench_docker_images.sh
Outdated
@@ -0,0 +1,20 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any objection to moving this from /bin
to /bench
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
*/ | ||
const std::string CIFAR_DATA_PATH = "cifar_data_path"; | ||
const std::string NUM_THREADS = "num_threads"; | ||
const std::string NUM_BATCHES = "num_batches"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think it makes sense to define benchmark-specific keys within individual implementation files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. Especially if they don't share prompt text (because the prompt support won't exist)
@@ -0,0 +1,47 @@ | |||
#ifndef CLIPPER_APP_METRICS_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include this within metrics.hpp
and the metrics
namespace instead.
void send_feedback(PredictTask task); | ||
|
||
VersionedModelId model_; | ||
int container_id_; | ||
int replica_id_; | ||
InputType input_type_; | ||
clipper::metrics::Histogram latency_hist_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems useful! Let's keep it. See previous comment about separating it from the update_throughput
method.
@@ -31,13 +32,14 @@ class ModelContainer { | |||
|
|||
size_t get_batch_size(Deadline deadline); | |||
double get_average_throughput_per_millisecond(); | |||
void update_throughput(size_t batch_size, long total_latency); | |||
void update_container_stats(size_t batch_size, long total_latency); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather we separate the critical task of updating container throughput for proper batching from the helpful but non-critical metrics update. Let's call processing_container->latency_hist_.insert(total_latency)
after processing_container->update_throughput
within the task executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking to dan, the plan is to revert the extraction of AppMetrics and make a separate BenchMetrics (or something) in bench_util. For now, the two measure seemingly identical things, but that can easily change with what we choose to (or not to) benchmark (say, prediction accuracy, which isn't captured in AppMetrics).
The change to update_throughput
sounds good to me.
@@ -291,6 +291,7 @@ class Histogram : public Metric { | |||
void insert(const int64_t value); | |||
const HistogramStats compute_stats(); | |||
static double percentile(std::vector<int64_t> snapshot, double rank); | |||
double percentile(double rank); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you add documentation indicating that a snapshot is obtained from the histogram's sampler and then used to calculate the percentile?
This new update removes the old Two questions before I request a review:
I'll make the corresponding changes and request a review once they're up. |
|
Test PASSed. |
Alright, it's ready for another review. Thanks a bunch to both of you for giving it a thorough pass once already. Pretty sure it's going to yell at me for formatting, but my local formatting script says all is fine -- would one of you mind pushing a formatting update? |
src/benchmarks/src/generic_bench.cpp
Outdated
// Modify it to be epoch and thread-specific | ||
query_vec[0] += (j * request_batch_size + i) / num_datapoints; | ||
query_vec[1] += thread_id; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some trouble here. Empirically, this actually does seem to prevent cache hits, but because all of the input data points are shared (through the variable data
), it seems like there could be a data race issue. We could make a copy of data
for each thread, but that feels wasteful. We could put a lock on all of the data or even on each data point, but that feels gratuitous. Modifying the data after the memcpy in std::make_shared
(by retrieving input->get_data() and and modifying that vector directly) doesn't seem to actually change the submitted input
variable. Thoughts on how to do this correctly?
7be80f1
to
0356aa9
Compare
src/benchmarks/src/generic_bench.cpp
Outdated
query_vec = data[data_index]; | ||
label = labels[data_index]; | ||
|
||
if (prevent_cache_hits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having some trouble here. In order to prevent us from hitting Clipper's internal cache, we need every submitted datapoint to be unique. Remember that we can cycle over the input data multiple times. The solution here attempts to modify each datapoint by adding info about the thread and epoch it belongs to.
Empirically, this approach actually does seem to prevent cache hits, but there are still at least two problems.
The first is that this solution, which replaces the first and second entry of each datapoint, assumes that there are no two distinct CIFAR datapoints that only differ in their first and/or second entries. If there are, this change won't actually prevent all cache hits. I think this is a pretty minor problem -- there actually aren't any two CIFAR datapoints I've encountered that fulfill this property.
The second: because all of the input data points are shared (through the variable data
), it seems like there could be a data race issue. We could make a copy of data
for each thread, but that feels wasteful. We could make a copy of each datapoint within the inner for loop before modifying it, but that feels expensive. We could put a lock on all of the data or even on each data point, but that feels gratuitous. We could just make every vector two entries longer and modify those instead, but then the datapoints wouldn't fit into models that are trained on the original datapoints which would be two entries shorter. Modifying the data after the memcpy in std::make_shared (by retrieving input->get_data() and and modifying that vector directly) doesn't seem to actually change the submitted input variable. Thoughts on how to do this correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR: Make a copy of the dataset for each thread to avoid the data race and continue to modify the query_vec as you're doing right now (by modifying the first and second elements).
Responses to some of your comments:
So my first comment is don't worry too much about optimizing the benchmark code for anything besides speed. That means extra copies of data aren't a big deal unless you are making the copy in an inner loop where the speed of the memcpy itself is an issue, and we should definitely avoid any locks.
There is definitely a data race here, it seems like the easiest thing to do is just make a copy of the dataset for each thread.
Your assumption about there being no CIFAR data points that differ only in the first and second dimension seems fine. Remember, if there are one or two cache hits it really won't affect anything.
You can't modify the inputs after creating them because get_data()
returns a const vector
so you can't modify it.
Test PASSed. |
SumBenchDockerfile
Outdated
COPY containers/python/sum_container.py /containers/python/ | ||
COPY bench/setup_sum_bench_docker.sh /bench/ | ||
COPY bench/set_bench_env_vars.sh /bench/ | ||
COPY containers/python/rpc.py /containers/python/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without copying rpc.py, I get an ImportError from the import statement in sum_container.
py. Same thing happens for NoopBenchDockerfile with noop_container.py
. Was having a hard time debugging it -- I'll keep looking into it and push a fix. Figured I'd this push up set of changes and not let this one problem block reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see the problem. RPCDockerfile copies the rpc.py
to /containers
, not /containers/python
(https://github.com/ucbrise/clipper/blob/develop/RPCDockerfile#L13)
If you fix the import in sum_container.py
you shouldn't need to copy rpc.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! Thanks for the catch.
Test PASSed. |
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly there. Just some formatting and cleanup.
bench/README.md
Outdated
|
||
- [`bench/setup_noop_bench.sh`](https://github.com/ucbrise/clipper/tree/develop/bench/setup_noop_bench.sh) runs the [`noop_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/noop_container.py). The default `model_name` is `"bench_noop"`. | ||
|
||
./bench/setup_noop_bench.sh [[<model_name> <model_version> [<clipper_ip>]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enclose these in triple backticks so they are formatted as code
bench/README.md
Outdated
|
||
- [`bench/setup_sum_bench.sh`](https://github.com/ucbrise/clipper/tree/develop/bench/setup_sum_bench.sh) runs the [`sum_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/sum_container.py). The default `model_name` is `"bench_sum"`. | ||
|
||
./bench/setup_sum_bench.sh [[<model_name> <model_version> [<clipper_ip>]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format as code
bench/README.md
Outdated
- [`bench/setup_sklearn_bench.sh`](https://github.com/ucbrise/clipper/tree/develop/bench/setup_sklearn_bench.sh) runs the [`sklearn_cifar_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/sklearn_cifar_container). If you wish to use this option, remember to download the CIFAR10 python dataset first. The default `model_name` is `"bench_sklearn_cifar"`. | ||
|
||
|
||
./bench/setup_sklearn_bench.sh <path_to_cifar_python_dataset> [[<model_name> <model_version> [<clipper_ip>]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format as code
bench/README.md
Outdated
- If you want to deploy the model in a Docker container, you'll need to build the model container Docker image (`model_version` is `1` by default): | ||
- Create the Docker images for the [`noop_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/noop_container.py) and [`sum_container`](https://github.com/ucbrise/clipper/blob/develop/containers/python/sum_container.py). The default names for each model are the same as the ones listed above. | ||
|
||
./bench/build_bench_docker_images.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format as code
bench/README.md
Outdated
|
||
- Run the container on the same host that the benchmarking tool will be run from: | ||
|
||
docker run [-e MODEL_NAME=<nondefault_model_name>] [-e MODEL_VERSION=<nondefault_model_version>] [-e IP=<non-aws-clipper-ip>] <image_id> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format as code
bench/README.md
Outdated
Under `histograms` (each of these entries lists the unit of measurement, number of datapoints, min, max, mean, standard deviation, and p50, p95, and p99 values): | ||
|
||
- `bench:cifar_bench:prediction_latency`: Statistics on latency measured from when the benchmarking script sends a request to when it receives a response. | ||
- `model:<model_name>:<model_version>:prediction_latency`: Statistics on latency measured from when Clipper sends datapoints to your model-container to when it receives a response. The value will always be lower than that of the former metric. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the last sentence under this bullet. The model prediction latency can definitely be higher than prediction latency, because predictions will return early if the SLO is set to be lower than the model latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; that didn't occur to me.
src/benchmarks/CMakeLists.txt
Outdated
@@ -3,7 +3,7 @@ cmake_minimum_required(VERSION 3.2 FATAL_ERROR) | |||
################################## | |||
# Benchmark executable | |||
|
|||
add_executable(end_to_end_bench src/end_to_end_bench.cpp) | |||
add_executable(end_to_end_bench src/end_to_end_bench.cpp src/bench_utils.cpp src/include/bench_utils.hpp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the bench_utils.hpp
file in src/
. In this case, nothing outside of src/benchmarks/src
should need to include bench_utils
so it's cleaner to just keep everything in one directory.
|
||
delay_micros = | ||
draw_from_poisson ? distribution(generator) : batch_delay_micros; | ||
std::this_thread::sleep_for(std::chrono::microseconds(delay_micros)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but one thing to note is that there is a minimum amount of time a thread can sleep for, even if you set sleep_time to 0. Just as an educational aside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is good to know -- we can just skip the sleep code if the delay_micros
is <= 0
3f08e36
to
e6dc747
Compare
e6dc747
to
6d577ad
Compare
Test PASSed. |
Test PASSed. |
Requested changes were addressed
This PR includes some utilities that could be useful for future benchmarking.
Some of the changes here are less obviously extensible to other benchmarking use cases. I'm just going to comment on the them directly to give context for why they exist. I'm happy to scrap or rework the ones that are too case-specific/ won't be useful for future benchmarking efforts.
This PR currently does not include the throughput benchmarking script (https://github.com/nishadsingh1/clipper/blob/benchmark/src/benchmarks/src/noop_bench.cpp). Given that we have a benchmarking example in the source,
end_to_end_bench
, and documentation accompanying it atbench/README.md
, I wonder if it would be good to include a clean version of the throughput benchmaring script too. It could demonstrate usage of some of the benchmarking utilities not used byend_to_end_bench
which we could also refer to in the documentation. Alternatively, we could move everything underbench/
andsrc/benchmarks/src/
(including the throughput benchmarking script) to some other branch with updated documentation atbench/README.md
.Resolves #222.