Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributed model executor abstraction #3191

Merged
merged 19 commits into from
Mar 11, 2024
Merged

Conversation

zhuohan123
Copy link
Collaborator

@zhuohan123 zhuohan123 commented Mar 5, 2024

This PR pulls out the distributed worker manager part of LLMEngine to a new set of classes, namely ModelExecutors. This can benefit us in separating the code for different hardware backends, as well as enabling support of single-box distributed execution without ray. Specifically, this PR implements 4 types of model executors:

  1. SingleGPUModelExecutor: Previous code path when not using ray.
  2. SingleGPUModelExecutorAsync: SingleGPUModelExecutor + several async function calls.
  3. RayDistributedModelExecutor: Previous distributed implementation with ray.
  4. RayDistributedModelExecutorAsync: RayDistributedModelExecutor + several async function calls.

TODOs after this PR

@njhill
Copy link
Collaborator

njhill commented Mar 5, 2024

Thanks @zhuohan123, this is the kind of thing I had in mind in this comment #2898 (comment)! If you like I can rework #2898 to plug into your abstraction.

@zhuohan123
Copy link
Collaborator Author

Thanks @zhuohan123, this is the kind of thing I had in mind in this comment #2898 (comment)! If you like I can rework #2898 to plug into your abstraction.

Yes, #2898 is the exact next PR I'm thinking about after this PR. This PR is still WIP and I might change things here and there. Let me ping you once this PR is finalized :)

@njhill
Copy link
Collaborator

njhill commented Mar 6, 2024

@zhuohan123 sounds great... yeah I meant once you were finished with this, no rush at all!

@zhuohan123 zhuohan123 changed the title [WIP] Add distributed model executor abstraction Add distributed model executor abstraction Mar 6, 2024
USE_RAY_COMPILED_DAG = bool(os.getenv("VLLM_USE_RAY_COMPILED_DAG", 0))


class RayDistributedModelExecutor:
Copy link
Collaborator

@Yard1 Yard1 Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest adding a common abstract class for different model executors, so that they all implement the same public API.

@zhuohan123
Copy link
Collaborator Author

@Yard1 Just FYI, one change that I made in this PR is that I moved PlacementGroup into the ParallelConfig. Please let me know if this is not a good idea.

@Yard1
Copy link
Collaborator

Yard1 commented Mar 7, 2024

I think it should be fine

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhuohan123 Awesome! Thanks for the great work! This refactor substantially cleans up the current system architecture while providing better extensibility 😄.

Overall, I'm happy with the PR and only have small concerns:

  1. I don't feel RayDistributedExecutor and SingleGPUModelRunner are good names. I'd propose RayGPURunner (or RayGPUExecutor) and GPURunner instead. WDYT?
  2. As a result of the refactoring, there is some duplicated code between RayDistributedExecutor and SingleGPUModelRunner. Can we reduce the duplication?

Please check out my review for more details.

Note: GPTQ and Marlin do not have bitwise correctness.
As a result, in this test, we just confirm that the top selected tokens of the
Note: GPTQ and Marlin do not have bitwise correctness.
As a result, in this test, we just confirm that the top selected tokens of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: Is our formatter not able to catch this kind of trailing whitespaces?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems like this is the case. cc @simon-mo

vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
vllm/executor/single_gpu_executor_async.py Outdated Show resolved Hide resolved
vllm/executor/utils.py Outdated Show resolved Hide resolved
vllm/executor/single_gpu_executor_async.py Outdated Show resolved Hide resolved
vllm/executor/ray_distributed_executor.py Outdated Show resolved Hide resolved
vllm/executor/ray_distributed_executor.py Outdated Show resolved Hide resolved
vllm/executor/ray_distributed_executor.py Outdated Show resolved Hide resolved
vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
vllm/engine/llm_engine.py Outdated Show resolved Hide resolved
@njhill
Copy link
Collaborator

njhill commented Mar 9, 2024

@zhuohan123 this looks great thanks! Related to @WoosukKwon's comment above though, it feels like there's a fair amount of duplication of logic between the implementations which would need to be updated in multiple places any time it changes.

Especially when also thinking about how to rework the multiprocessing abstraction from #2898.

I think that could be addressed with another abstraction layer beneath your ModelExecutor one covering a subset of the implementations - specifically all of the current GPU-process based ones, but not neuron. WDYT? I'd be happy to show what I mean in another branch.

I agree with @Yard1 that it would be good to include an abstract base class.

@zhuohan123
Copy link
Collaborator Author

@njhill @WoosukKwon Regarding duplicated code. I think I have tried my best to pull out shared codes between the two executors. How about we merge this PR first, and then see whether we can further reduce code logic duplication?

@zhuohan123
Copy link
Collaborator Author

@WoosukKwon This PR is ready for review.

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing my review!

@zhuohan123 zhuohan123 merged commit 4c92270 into main Mar 11, 2024
24 checks passed
starmpcc pushed a commit to starmpcc/vllm that referenced this pull request Mar 14, 2024
@binarycrayon
Copy link

Hi I don't have the context of this pr, but how can we enable SingleGPUModelRunner path?

robertgshaw2-neuralmagic added a commit to neuralmagic/nm-vllm that referenced this pull request Mar 15, 2024
SUMMARY:
* upstream merge (sync) up to `54be8a0`

## NOTES

- Updated ruff configs had line limits. Had to clean up a lot of files
manually. I think `./format.sh` runs yapf and ruff only on the
`nm-vllm/vllm` directory whereas our automation runs on everything in
the `nm-vllm`, so it was a bit tricky for me to catch why the automation
was failing. cc @varun-sundar-rabindranath please review the benchmark
directory in detail

### Primary upstream changes:

#### Kernels
- [`batched_rotary_embedding`
](vllm-project@7e9bd08)
- [`gelu_tanh_and_mul`]()

#### Core
- [`LLMEngine` refactor](vllm-project#3191)
<<< adds new layer of abstraction to vLLM. **All should look at this**

TEST PLAN:
- nightly automation

---------

Signed-off-by: Tao He <sighingnow@gmail.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>
Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Mustafa Eyceoz <maxusmusti@gmail.com>
Co-authored-by: Roy <jasonailu87@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Massimiliano Pronesti <massimiliano.pronesti@gmail.com>
Co-authored-by: 44670 <44670@users.noreply.github.com>
Co-authored-by: zhaoyang-star <zhaoyangstar@foxmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Jared Moore <27744679+jlcmoore@users.noreply.github.com>
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: 张大成 <1345739055@qq.com>
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com>
Co-authored-by: Jingru <niejingru@hotmail.com>
Co-authored-by: Dylan Hawk <51147702+dylanwhawk@users.noreply.github.com>
Co-authored-by: Tao He <sighingnow@gmail.com>
Co-authored-by: Ganesh Jagadeesan <ganesh.jcs@gmail.com>
Co-authored-by: Allen.Dou <allen.dou@hotmail.com>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: CHU Tianxiang <tianxiang.ctx@alibaba-inc.com>
Co-authored-by: Jae-Won Chung <jwnchung@umich.edu>
Co-authored-by: Seonghyeon <seonghyeon.drew@gmail.com>
Co-authored-by: Billy Cao <aliencaocao@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: felixzhu555 <79335195+felixzhu555@users.noreply.github.com>
Co-authored-by: br3no <breno@veltefaria.de>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Sherry <503147114@qq.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Huarong <huohuarong@gmail.com>
Co-authored-by: huohuarong <huohuarong@zuoshouyisheng.com>
Co-authored-by: Robert Shaw <114415538+rib-2@users.noreply.github.com>
Co-authored-by: alexm <alexm@neuralmagic.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: cloudhan <cloudhan@outlook.com>
Co-authored-by: Sage Moore <sagemoore@utexas.edu>
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Jason Cox <jason@jasonacox.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: TianYu GUO <guoty9@mail2.sysu.edu.cn>
Co-authored-by: Jialun Lyu <43287111+pian13131@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
Co-authored-by: lcskrishna <lollachaitanya@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Co-authored-by: Chujie Zheng <chujiezhengchn@gmail.com>
Co-authored-by: TechxGenus <jianghao0728@mail.ustc.edu.cn>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: whyiug <whyiug@hotmail.com>
Co-authored-by: Terry <149540247+tterrysun@users.noreply.github.com>
Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
Co-authored-by: kliuae <17350011+kliuae@users.noreply.github.com>
Co-authored-by: DAIZHENWEI <32122197+DAIZHENWEI@users.noreply.github.com>
Co-authored-by: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com>
Co-authored-by: Bo-Wen Wang <1849994161@qq.com>
Co-authored-by: Ronan McGovern <78278410+RonanKMcGovern@users.noreply.github.com>
Co-authored-by: Hui Liu <96135754+hliuca@users.noreply.github.com>
Co-authored-by: 陈序 <chenxu2048@gmail.com>
Co-authored-by: Or Sharir <or+github@sharir.org>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
Co-authored-by: Dan Clark <44146800+declark1@users.noreply.github.com>
Co-authored-by: Daniel Clark <daniel.clark@ibm.com>
Co-authored-by: youkaichao <youkaichao@126.com>
@zhuohan123
Copy link
Collaborator Author

Hi I don't have the context of this pr, but how can we enable SingleGPUModelRunner path?

By default if you use 1 GPU you will go to that path.

dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024
@zhuohan123 zhuohan123 deleted the add-executor-abstraction branch April 26, 2024 00:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants