Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Add support for DBRX #3660

Merged
merged 9 commits into from Mar 27, 2024
Merged

Conversation

megha95
Copy link
Contributor

@megha95 megha95 commented Mar 27, 2024

This PR adds supports for DBRX. DBRX is a Mixture-of-Experts (MoE) model trained by Databricks with 132B total parameters and 36B live parameters. More details about the model can be found at: DBRX Technical Blog

Model weights can be found in HF repo:

Model Link Description
DBRX Base HF Pre-trained Base Model
DBRX Instruct HF Finetuned model for instruction following

This PR is currently based off of an older commit because the latest main has some issues. Therefore, there are some minor merge conflicts that will be corrected soon.
Meanwhile, to run DBRX with vLLM, this PR can be used. It has been tested on NVIDIA A100 and H100 systems.

Note: Given model has 132B total parameters, it is suggested to use mininum 4x80GB GPU cards to run 16-bit inference. Try increasing gpu_memory_utilization if you are running on 4 GPUs.

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@megha95 Thanks for submitting the PR! Very excited about the new release. Left minor comments on some stylistic issues. PTAL.

vllm/transformers_utils/configs/__init__.py Outdated Show resolved Hide resolved
vllm/transformers_utils/config.py Outdated Show resolved Hide resolved
vllm/model_executor/models/dbrx.py Show resolved Hide resolved
vllm/transformers_utils/configs/dbrx.py Show resolved Hide resolved
vllm/model_executor/models/dbrx.py Outdated Show resolved Hide resolved
@WoosukKwon WoosukKwon linked an issue Mar 27, 2024 that may be closed by this pull request
@WoosukKwon WoosukKwon removed their assignment Mar 27, 2024
@simon-mo simon-mo added the release-blocker This PR/issue blocks the next release, therefore deserves highest priority label Mar 27, 2024
@megha95 megha95 requested a review from WoosukKwon March 27, 2024 18:55
@WoosukKwon WoosukKwon self-assigned this Mar 27, 2024
Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for submitting the PR! Very excited to see what people will build on top of DBRX.

@WoosukKwon WoosukKwon enabled auto-merge (squash) March 27, 2024 19:48
@WoosukKwon WoosukKwon changed the title Support for DBRX [Model] Add support for DBRX Mar 27, 2024
@WoosukKwon WoosukKwon merged commit e24336b into vllm-project:main Mar 27, 2024
29 of 33 checks passed
@RonanKMcGovern
Copy link
Contributor

doesn't this need tiktoken to be installed? I guess that's not pushed yet to the docker image maybe? could the docker image be re-built?

@Calvinnncy97
Copy link

Hi guys, is there an example command that can be used to deploy an API server for DBRX-Base with vLLM? I have been trying with CUDA_VISIBLE_DEVICES=1,2,3,4 python -m vllm.entrypoints.openai.api_server --model databricks/dbrx-base --tensor-parallel-size 4 --host localhost --port 12345 --gpu-memory-utilization 0.9 but it seems to be asking for 1EB of memory with error message torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate more than 1EB memory.

@RonanKMcGovern
Copy link
Contributor

RonanKMcGovern commented Apr 22, 2024 via email

@Calvinnncy97
Copy link

I see. Makes sense if you set the --max-model-len to 4096. I was doing 32k context length. On the topic of precision, vLLM automatically uses FP16 for FP16 model, and BF16 for BF16 model if I am not wrong.

@Calvinnncy97
Copy link

Calvinnncy97 commented Apr 23, 2024

Hi, update: I found out what the issue is. It is due to this warning:

WARNING 04-23 04:08:51 custom_all_reduce.py:52] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.

Because custom allreduce is disabled, the model did not manage to load. The solution is to add --disable-custom-all-reduce into the command

@RonanKMcGovern
Copy link
Contributor

ok, wow thanks, that's pretty specific

sighingnow pushed a commit to sighingnow/vllm that referenced this pull request Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker This PR/issue blocks the next release, therefore deserves highest priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: Supporting DBRX from Databricks
5 participants