-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] Add Buildkite #2355
[CI] Add Buildkite #2355
Conversation
note to self: The one last blocker is the memory requirements for huggingface model, we can run models on 2xL4 in vLLM but huggingface doesn't have a good TP strategy that's simple to use. currently i'm trying accelerate to do offload, but if it doesn't work we might go A100 |
todos: migrate lint, docs, and wheels to CPU only machines |
I soft failed the kernels and models tests. The models ran successfully with bfloat16 but the some output doesn't match :(. The kernel is too difficult to tune. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hard work Simon! Left some small comments.
if max_tries == 0: | ||
raise RuntimeError("Server did not start") from err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if max_tries == 0: | |
raise RuntimeError("Server did not start") from err | |
if max_tries == 0: | |
raise RuntimeError("Server did not start") from err | |
max_tries -= 1 |
import pytest | ||
import torch | ||
import ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why change to ray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the code below, i added a comment saying ray gives way better log for debugging purpose (i couldn't figure the failures from multiprocessing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha! makes a lot of sense.
tests/models/test_models.py
Outdated
@@ -21,7 +21,8 @@ | |||
|
|||
|
|||
@pytest.mark.parametrize("model", MODELS) | |||
@pytest.mark.parametrize("dtype", ["float"]) | |||
# half is required to get this working on CI's L4 GPU | |||
@pytest.mark.parametrize("dtype", ["half"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this as float, since otherwise the test will fail on a100s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can implement a simpler test for L4 in another PR.
@simon-mo Let’s expedite this PR so that we can have CI working for all other PRs? |
This PR adds basic setup for GPU CI environment. It should enable us to run our tests on L4 GPUs. As developer, you can add new tests to
.buildkite/test-pipeline.yaml
.Currently, I have all tests enabled, in addition to benchmarks. However, I don't want to block the merge of this PR for debugging model output (test_models.py) and tuning memory (test_attention.py). I marked those tests as "soft fail" for now.
Please reference the latest build here: https://buildkite.com/vllm/ci/builds/182 as example. The end to end time of the build on worst case (fresh docker build, slow machine starts) is about 1hr, on the best case (docker cached, machine available) is about 15 minutes. Of course, if there are too many PRs submitted at the same time, it might needs to wait and queue up a bit. We are capped at 10 GPU machines due to budget reason. The full infrastructure setup is described and maintained in a separate repo https://github.com/vllm-project/buildkite-ci.
Code change in the PR is mostly in
.buildkite
directory and associatedDockerfile
andsetup.py
. Everything else is done in order to make existing test pass.Future work includes: