[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. #4642

Alexei-V-Ivanov-AMD · 2024-05-07T02:37:05Z

This PR achieves the following goals:

Corrects docker run interface to launch containers properly;
Trims the number of AMD tests.

simon-mo · 2024-05-07T15:42:27Z

.buildkite/test-template.j2

@@ -26,7 +26,7 @@ steps:
      - label: "AMD: {{ step.label }}"
        agents:
          queue: amd
-        command: bash .buildkite/run-amd-test.sh "'cd {{ (step.working_dir or default_working_dir) | safe  }} && {{ step.command  or (step.commands | join(' && ')) | safe }}'"
+        command: bash .buildkite/run-amd-test.sh "cd {{ (step.working_dir or default_working_dir) | safe  }} ; {{ step.command  or (step.commands | join(" ; ")) | safe }}"


This might not fail the bash command inside. If the test failed, the whole command will not exit with 1.

I've checked (https://buildkite.com/vllm/ci/builds/6722)
I does fail on the failing test inside.

Even on the partially failing test, it still fails. See e.g. "AMD: Speculative decoding tests" or "AMD: Models Test" or "AMD: Engine Test" in the above build.

comaniac · 2024-05-07T21:48:35Z

Looks like the AMD CI is broken after this PR? I saw the same error message in many CI runs for AMD tests:

Unable to open /dev/kfd read-write: Invalid argument
--
  | Failed to get user name to check for render group membership
  | 🚨 Error: The command exited with status 1

Alexei-V-Ivanov-AMD · 2024-05-07T21:55:26Z

Looks like the AMD CI is broken after this PR? I saw the same error message in many CI runs for AMD tests:

No, this error is proven to be un-related to the present PR. We have definitely seen this issue before this PR.
The best current remedy is to request a "re-run" of the failed test. All CI nodes are at times affected, regardless of their HW or SW particularities. The error occurs in "bursts": if you see it coming, wait ~ 1 min and request a re-run.

comaniac · 2024-05-07T22:54:35Z

I see. Also I'm working on #4535 that changes AMD kernels a bit, but I keep seeing the compilation errors which I didn't see in the NVIDIA build. So I tried to find an existing success build for reference. If you have any idea about that error (https://buildkite.com/vllm/ci/builds/6803#018f550a-e708-4a0a-a48c-97a5a4d85a40/1106-1856) please let me know.

Alexei-V-Ivanov-AMD · 2024-05-07T23:16:27Z

I see. Also I'm working on #4535 that changes AMD kernels a bit, but I keep seeing the compilation errors which I didn't see in the NVIDIA build. So I tried to find an existing success build for reference. If you have any idea about that error (https://buildkite.com/vllm/ci/builds/6803#018f550a-e708-4a0a-a48c-97a5a4d85a40/1106-1856) please let me know.

The error you're referring to appears to be a cmake error during the container build. It is apparently persistent through multiple attempts across different AMD tests in the referred build. It is definitely not related to the PR #4642, though, as rocm containers were getting built before it.

Cmake must have complained about something at some point above the final error message. To isolate and analyze the cause of the error during this CI build you'll need to make a fresh clone of the repo and then build a standard rocm docker container:

...vllm$ docker build -t {container name} -f Dockerfile.rocm .

That how it gets built in the CI anyway (

vllm/.buildkite/run-amd-test.sh

Line 20 in 8344f77

echo "--- Building container"

)

The stdout dump will give you plenty of information about your issue.

…ct#4642)

ruff formatting formatting -isort formatting yapf add request class init file added adding CPU_executor change adding support for cpu engine formatting backslash error fix formatting tests update update worker test update worker test formatting Disable cuda version check in vllm-openai image (vllm-project#4530) [Bugfix] Fix `asyncio.Task` not being subscriptable (vllm-project#4623) [CI] use ccache actions properly in release workflow (vllm-project#4629) [CI] Add retry for agent lost (vllm-project#4633) Update lm-format-enforcer to 0.10.1 (vllm-project#4631) [Kernel] Make static FP8 scaling more robust (vllm-project#4570) Previously FP8 static scaling works if the scales are overestimating the maxima of all activation tensors during computation. However this will not always be the case even if the scales were calibrated very carefully. For example, with the activations in my checkpoint https://huggingface.co/pcmoritz/Mixtral-8x7B-v0.1-fp8-act-scale (which was calibrated on https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), I'm getting the following mostly random performance on MMLU: | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.2295|± |0.0035| | - humanities |N/A |none | 5|acc |0.2421|± |0.0062| | - other |N/A |none | 5|acc |0.2398|± |0.0076| | - social_sciences|N/A |none | 5|acc |0.2171|± |0.0074| | - stem |N/A |none | 5|acc |0.2125|± |0.0073| With the fix in this PR where the scaled activations are clamped between [-std::numeric_limits<c10::Float8_e4m3fn>::max(), std::numeric_limits<c10::Float8_e4m3fn>::max()] to make sure there are no NaNs, the performance is | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.7008|± |0.0036| | - humanities |N/A |none | 5|acc |0.6453|± |0.0065| | - other |N/A |none | 5|acc |0.7692|± |0.0072| | - social_sciences|N/A |none | 5|acc |0.8083|± |0.0070| | - stem |N/A |none | 5|acc |0.6115|± |0.0083| This is not perfect yet but is getting very close to the FP16 / dynamic activation scale performance. [Core][Optimization] change python dict to pytorch tensor (vllm-project#4607) [Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (vllm-project#4642) [Bugfix] Fixed error in slice_lora_b for MergedQKVParallelLinearWithLora (vllm-project#4609) [Core][Optimization] change copy-on-write from dict[int, list] to list (vllm-project#4648) [Bug fix][Core] fixup ngram not setup correctly (vllm-project#4551) Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> [Core][Distributed] support cpu&device in broadcast tensor dict (vllm-project#4660) [Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660) [Core] Optimize sampler get_logprobs (vllm-project#4594) [CI] Make mistral tests pass (vllm-project#4596) [Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (vllm-project#4573) [Misc] Add `get_name` method to attention backends (vllm-project#4685) [Core] Faster startup for LoRA enabled models (vllm-project#4634) [Core][Optimization] change python dict to pytorch tensor for blocks to swap (vllm-project#4659) [CI/Test] fix swap test for multi gpu (vllm-project#4689) [Misc] Use vllm-flash-attn instead of flash-attn (vllm-project#4686) [Dynamic Spec Decoding] Auto-disable by the running queue size (vllm-project#4592) Co-authored-by: Cade Daniel <edacih@gmail.com> [Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs (vllm-project#4672) [Bugfix] Fine-tune gptq_marlin configs to be more similar to marlin (vllm-project#4626) consolidation

formatting ruff formatting formatting -isort formatting yapf add request class init file added adding CPU_executor change adding support for cpu engine formatting backslash error fix formatting tests update update worker test update worker test formatting Disable cuda version check in vllm-openai image (vllm-project#4530) [Bugfix] Fix `asyncio.Task` not being subscriptable (vllm-project#4623) [CI] use ccache actions properly in release workflow (vllm-project#4629) [CI] Add retry for agent lost (vllm-project#4633) Update lm-format-enforcer to 0.10.1 (vllm-project#4631) [Kernel] Make static FP8 scaling more robust (vllm-project#4570) Previously FP8 static scaling works if the scales are overestimating the maxima of all activation tensors during computation. However this will not always be the case even if the scales were calibrated very carefully. For example, with the activations in my checkpoint https://huggingface.co/pcmoritz/Mixtral-8x7B-v0.1-fp8-act-scale (which was calibrated on https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), I'm getting the following mostly random performance on MMLU: | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.2295|± |0.0035| | - humanities |N/A |none | 5|acc |0.2421|± |0.0062| | - other |N/A |none | 5|acc |0.2398|± |0.0076| | - social_sciences|N/A |none | 5|acc |0.2171|± |0.0074| | - stem |N/A |none | 5|acc |0.2125|± |0.0073| With the fix in this PR where the scaled activations are clamped between [-std::numeric_limits<c10::Float8_e4m3fn>::max(), std::numeric_limits<c10::Float8_e4m3fn>::max()] to make sure there are no NaNs, the performance is | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| |mmlu |N/A |none | 0|acc |0.7008|± |0.0036| | - humanities |N/A |none | 5|acc |0.6453|± |0.0065| | - other |N/A |none | 5|acc |0.7692|± |0.0072| | - social_sciences|N/A |none | 5|acc |0.8083|± |0.0070| | - stem |N/A |none | 5|acc |0.6115|± |0.0083| This is not perfect yet but is getting very close to the FP16 / dynamic activation scale performance. [Core][Optimization] change python dict to pytorch tensor (vllm-project#4607) [Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (vllm-project#4642) [Bugfix] Fixed error in slice_lora_b for MergedQKVParallelLinearWithLora (vllm-project#4609) [Core][Optimization] change copy-on-write from dict[int, list] to list (vllm-project#4648) [Bug fix][Core] fixup ngram not setup correctly (vllm-project#4551) Co-authored-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> [Core][Distributed] support cpu&device in broadcast tensor dict (vllm-project#4660) [Core][Distributed] support both cpu and device tensor in broadcast tensor dict (vllm-project#4660) [Core] Optimize sampler get_logprobs (vllm-project#4594) [CI] Make mistral tests pass (vllm-project#4596) [Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (vllm-project#4573) [Misc] Add `get_name` method to attention backends (vllm-project#4685) [Core] Faster startup for LoRA enabled models (vllm-project#4634) [Core][Optimization] change python dict to pytorch tensor for blocks to swap (vllm-project#4659) [CI/Test] fix swap test for multi gpu (vllm-project#4689) [Misc] Use vllm-flash-attn instead of flash-attn (vllm-project#4686) [Dynamic Spec Decoding] Auto-disable by the running queue size (vllm-project#4592) Co-authored-by: Cade Daniel <edacih@gmail.com> [Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs (vllm-project#4672) [Bugfix] Fine-tune gptq_marlin configs to be more similar to marlin (vllm-project#4626) consolidation

…ct#4642)

Alexei-V-Ivanov-AMD added 9 commits May 6, 2024 21:35

Update run-amd-test.sh

85aa7cc

Update run-amd-test.sh

2369747

Update run-amd-test.sh

391c684

Update test-template.j2

9817aeb

Update run-amd-test.sh

c589b7c

Update test-template.j2

e76f531

Update test-template.j2

5e46c6e

Update test-template.j2

550a626

Update test-pipeline.yaml

81a35ad

simon-mo reviewed May 7, 2024

View reviewed changes

simon-mo approved these changes May 7, 2024

View reviewed changes

simon-mo merged commit 478aed5 into vllm-project:main May 7, 2024
55 checks passed

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 8, 2024

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (vllm-proje…

52f24fc

…ct#4642)

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (vllm-proje…

a3ff2ae

…ct#4642)

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (vllm-proje…

ffc7024

…ct#4642)

tybalex pushed a commit to rubra-ai/vllm that referenced this pull request May 25, 2024

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (vllm-proje…

a516dde

…ct#4642)

mawong-amd pushed a commit to ROCm/vllm that referenced this pull request Jun 3, 2024

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (vllm-proje…

5a6b77a

…ct#4642)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. #4642

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. #4642

Alexei-V-Ivanov-AMD commented May 7, 2024 •

edited

Loading

simon-mo May 7, 2024

Alexei-V-Ivanov-AMD May 7, 2024 •

edited

Loading

comaniac commented May 7, 2024

Alexei-V-Ivanov-AMD commented May 7, 2024 •

edited

Loading

comaniac commented May 7, 2024

Alexei-V-Ivanov-AMD commented May 7, 2024 •

edited

Loading

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. #4642

[Build/CI] Fixing 'docker run' to re-enable AMD CI tests. #4642

Conversation

Alexei-V-Ivanov-AMD commented May 7, 2024 • edited Loading

simon-mo May 7, 2024

Choose a reason for hiding this comment

Alexei-V-Ivanov-AMD May 7, 2024 • edited Loading

Choose a reason for hiding this comment

comaniac commented May 7, 2024

Alexei-V-Ivanov-AMD commented May 7, 2024 • edited Loading

comaniac commented May 7, 2024

Alexei-V-Ivanov-AMD commented May 7, 2024 • edited Loading

Alexei-V-Ivanov-AMD commented May 7, 2024 •

edited

Loading

Alexei-V-Ivanov-AMD May 7, 2024 •

edited

Loading

Alexei-V-Ivanov-AMD commented May 7, 2024 •

edited

Loading

Alexei-V-Ivanov-AMD commented May 7, 2024 •

edited

Loading