[CI] Add Decode Context Parallelism (DCP) test to CI #24487

minosfuture · 2025-09-09T07:03:04Z

Purpose

Guard both hopper and blackwell support for DCP

Test Plan

local test CUDA_VISIBLE_DEVICES=0,1 pytest -v -s tests/distributed/test_context_parallel.py

Test Result

==================================================================================== warnings summary ====================================================================================
../../../../../home/yming/uv_env/vllm/lib64/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /home/yming/uv_env/vllm/lib64/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

tests/distributed/test_context_parallel.py::test_cp_generation[deepseek-ai/DeepSeek-V2-Lite-Chat-parallel_setup0-mp-1-auto-test_options0]
tests/distributed/test_context_parallel.py::test_cp_generation[deepseek-ai/DeepSeek-V2-Lite-Chat-parallel_setup1-mp-1-auto-test_options1]
tests/distributed/test_context_parallel.py::test_cp_generation[deepseek-ai/DeepSeek-V2-Lite-Chat-parallel_setup2-mp-1-auto-test_options2]
tests/distributed/test_context_parallel.py::test_cp_generation[deepseek-ai/DeepSeek-V2-Lite-Chat-parallel_setup3-mp-1-auto-test_options3]
  /data/users/yming/gitrepos/vllm/tests/utils.py:814: DeprecationWarning: This process (pid=1422877) is multi-threaded, use of fork() may lead to deadlocks in the child.
    pid = os.fork()

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================= 4 passed, 5 warnings in 237.92s (0:03:57) ========================================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Ming Yang <minos.future@gmail.com>

gemini-code-assist

Code Review

This pull request adds CI tests for Decode Context Parallelism (DCP) on Hopper and Blackwell architectures. The changes in the CI configuration and the test file are mostly correct and achieve this goal. I've found one potential issue in the test file that could make the test framework fragile for future changes and have suggested a fix.

gemini-code-assist · 2025-09-09T07:04:42Z

tests/distributed/test_context_parallel.py

    [
        params for model_id, settings in CP_TEXT_GENERATION_MODELS.items()
-        for params in settings.iter_params(model_id)
+        for setting in settings for params in setting.iter_params(model_id)


The list comprehension here assumes that the values in CP_TEXT_GENERATION_MODELS are always lists of CPTestSettings. This might be fragile if a new model is added with a single CPTestSettings object instead of a list. This would cause the test parameter generation to fail. To make this more robust, you could handle both cases by ensuring you're always iterating over a list.

Suggested change

for setting in settings for params in setting.iter_params(model_id)

for setting in (settings if isinstance(settings, list) else [settings]) for params in setting.iter_params(model_id)

youkaichao · 2025-09-09T07:51:11Z

.buildkite/test-pipeline.yaml

+##### H200 test #####
 - label: Qwen MoE EP Test # optional
  gpu: h200
  optional: true
  num_gpus: 2
  commands:
    - CUDA_VISIBLE_DEVICES=1,2 VLLM_ALL2ALL_BACKEND=deepep_high_throughput VLLM_USE_DEEP_GEMM=1 VLLM_LOGGING_LEVEL=DEBUG python3 /vllm-workspace/examples/offline_inference/data_parallel.py --model Qwen/Qwen1.5-MoE-A2.7B --tp-size=1  --dp-size=2 --max-model-len 2048
+
+
+- label: Hopper Decode Context Parallelism Test # optional
+  gpu: h200
+  optional: true
+  num_gpus: 2
+  commands:
+    - pytest -v -s tests/distributed/test_context_parallel.py
+


merge into one hopper distributed tests ?

Signed-off-by: Ming Yang <minos.future@gmail.com>

simon-mo · 2025-09-12T17:45:07Z

Can you check they actually run in CI by unblocking them?

Signed-off-by: Ming Yang <minos.future@gmail.com>

minosfuture · 2025-09-15T16:29:08Z

unblocked added CI tests and they've passed. But the existing qwen moe test has been failing. I can reproduce that locally. I wonder if this qwen moe test is never executed as part of CI.
I'll debug this once I have some cycles. Tracking the issue here: #24797.
@youkaichao @simon-mo is it ok to merge this PR in?

youkaichao · 2025-09-16T13:21:18Z

sounds good, let's merge it first.

) Signed-off-by: Ming Yang <minos.future@gmail.com>

Add DCP to CI

e6de943

Signed-off-by: Ming Yang <minos.future@gmail.com>

mergify bot added the ci/build label Sep 9, 2025

add comment

e1a4326

Signed-off-by: Ming Yang <minos.future@gmail.com>

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

youkaichao reviewed Sep 9, 2025

View reviewed changes

address comment: combine CI tests

07a0b95

Signed-off-by: Ming Yang <minos.future@gmail.com>

youkaichao approved these changes Sep 10, 2025

View reviewed changes

youkaichao enabled auto-merge (squash) September 10, 2025 09:31

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025

minosfuture added 2 commits September 10, 2025 11:03

Merge branch 'main' into add_dcp_ci

2bf145d

Merge branch 'main' into add_dcp_ci

fcddeb3

add working_dir; fix relative path

0d62d75

Signed-off-by: Ming Yang <minos.future@gmail.com>

auto-merge was automatically disabled September 13, 2025 03:19
Head branch was pushed to by a user without write access

run dcp test before moe (for testing purpose)

8375e79

Signed-off-by: Ming Yang <minos.future@gmail.com>

youkaichao merged commit 4e5affe into vllm-project:main Sep 16, 2025
72 of 76 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[CI] Add Decode Context Parallelism (DCP) test to CI (vllm-project#24487

b761a12

) Signed-off-by: Ming Yang <minos.future@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[CI] Add Decode Context Parallelism (DCP) test to CI #24487

[CI] Add Decode Context Parallelism (DCP) test to CI #24487

Uh oh!

minosfuture commented Sep 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 9, 2025

Uh oh!

youkaichao Sep 9, 2025

Uh oh!

minosfuture Sep 9, 2025

Uh oh!

simon-mo commented Sep 12, 2025

Uh oh!

minosfuture commented Sep 15, 2025

Uh oh!

youkaichao commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

	for setting in settings for params in setting.iter_params(model_id)
	for setting in (settings if isinstance(settings, list) else [settings]) for params in setting.iter_params(model_id)

Uh oh!

[CI] Add Decode Context Parallelism (DCP) test to CI #24487

[CI] Add Decode Context Parallelism (DCP) test to CI #24487

Uh oh!

Conversation

minosfuture commented Sep 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

youkaichao Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

simon-mo commented Sep 12, 2025

Uh oh!

minosfuture commented Sep 15, 2025

Uh oh!

youkaichao commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

minosfuture commented Sep 9, 2025 •

edited by github-actions bot

Loading