From c5a2c1562ffbd1d316141034399238fff007e975 Mon Sep 17 00:00:00 2001 From: Naman Lalit Date: Sat, 27 Sep 2025 21:47:48 -0700 Subject: [PATCH 1/2] Added documentation for continuous benchmarking and profiling Signed-off-by: Naman Lalit --- docs/contributing/benchmarks.md | 24 ++++++++++++++++++++++++ docs/contributing/profiling.md | 16 ++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/docs/contributing/benchmarks.md b/docs/contributing/benchmarks.md index a97d1fa6a3a5..cf14770c01a6 100644 --- a/docs/contributing/benchmarks.md +++ b/docs/contributing/benchmarks.md @@ -823,6 +823,30 @@ The latest performance results are hosted on the public [vLLM Performance Dashbo More information on the performance benchmarks and their parameters can be found in [Benchmark README](https://github.com/intel-ai-tce/vllm/blob/more_cpu_models/.buildkite/nightly-benchmarks/README.md) and [performance benchmark description](gh-file:.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md). +### Continuous Benchmarking + +The continuous benchmarking provides automated performance monitoring for vLLM across different models and GPU devices. This helps track vLLM's performance characteristics over time and identify any performance regressions or improvements. + +#### How It Works + +The continuous benchmarking is triggered via a [GitHub workflow CI](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) in the PyTorch infrastructure repository, which runs automatically every 4 hours. The workflow executes three types of performance tests: + +- **Serving tests**: Measure request handling and API performance +- **Throughput tests**: Evaluate token generation rates +- **Latency tests**: Assess response time characteristics + +#### Benchmark Configuration + +The benchmarking currently runs on a predefined set of models configured in the [vllm-benchmarks directory](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks). To add new models for benchmarking: + +1. Navigate to the appropriate GPU directory in the benchmarks configuration +2. Add your model specifications to the corresponding configuration files +3. The new models will be included in the next scheduled benchmark run + +#### Viewing Results + +All continuous benchmarking results are automatically published to the public [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm). + [](){ #nightly-benchmarks } ## Nightly Benchmarks diff --git a/docs/contributing/profiling.md b/docs/contributing/profiling.md index a1b7927a95d1..22400da308de 100644 --- a/docs/contributing/profiling.md +++ b/docs/contributing/profiling.md @@ -160,6 +160,22 @@ GUI example: Screenshot 2025-03-05 at 11 48 42 AM +## vLLM Continuous Profiling + +There is a [GitHub CI workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-profiling.yml) in the PyTorch infrastructure repository that provides continuous profiling for different models on vLLM. This automated profiling helps track performance characteristics over time and across different model configurations. + +### How It Works + +The workflow currently runs weekly profiling sessions for selected models, generating detailed performance traces that can be analyzed using different tools to identify performance regressions or optimization opportunities. But, it can be triggered manually as well, using the Github Action tool. + +### Adding New Models + +To extend the continuous profiling to additional models, you can modify the [profiling-tests.json](https://github.com/pytorch/pytorch-integration-testing/blob/main/vllm-profiling/cuda/profiling-tests.json) configuration file in the PyTorch integration testing repository. Simply add your model specifications to this file to include them in the automated profiling runs. + +### Viewing Profiling Results + +The profiling traces generated by the continuous profiling workflow are publicly available on the [vLLM Performance Dashboard](https://hud.pytorch.org/benchmark/llms?repoName=vllm-project%2Fvllm). Look for the **Profiling traces** table to access and download the traces for different models and runs. + ## Profiling vLLM Python Code The Python standard library includes From 838f7ff122950544848dfc85d5c2ba337fbcce94 Mon Sep 17 00:00:00 2001 From: Naman Lalit Date: Sat, 27 Sep 2025 22:16:04 -0700 Subject: [PATCH 2/2] minor changes Signed-off-by: Naman Lalit --- docs/contributing/profiling.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing/profiling.md b/docs/contributing/profiling.md index 22400da308de..b62560a58748 100644 --- a/docs/contributing/profiling.md +++ b/docs/contributing/profiling.md @@ -160,7 +160,7 @@ GUI example: Screenshot 2025-03-05 at 11 48 42 AM -## vLLM Continuous Profiling +## Continuous Profiling There is a [GitHub CI workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-profiling.yml) in the PyTorch infrastructure repository that provides continuous profiling for different models on vLLM. This automated profiling helps track performance characteristics over time and across different model configurations.