[Model] Add support for DBRX #3660

megha95 · 2024-03-27T14:53:51Z

This PR adds supports for DBRX. DBRX is a Mixture-of-Experts (MoE) model trained by Databricks with 132B total parameters and 36B live parameters. More details about the model can be found at: DBRX Technical Blog

Model weights can be found in HF repo:

Model	Link	Description
DBRX Base	HF	Pre-trained Base Model
DBRX Instruct	HF	Finetuned model for instruction following

This PR is currently based off of an older commit because the latest main has some issues. Therefore, there are some minor merge conflicts that will be corrected soon.
Meanwhile, to run DBRX with vLLM, this PR can be used. It has been tested on NVIDIA A100 and H100 systems.

Note: Given model has 132B total parameters, it is suggested to use mininum 4x80GB GPU cards to run 16-bit inference. Try increasing gpu_memory_utilization if you are running on 4 GPUs.

WoosukKwon

@megha95 Thanks for submitting the PR! Very excited about the new release. Left minor comments on some stylistic issues. PTAL.

vllm/transformers_utils/configs/__init__.py

vllm/transformers_utils/config.py

vllm/model_executor/models/dbrx.py

vllm/transformers_utils/configs/dbrx.py

vllm/model_executor/models/dbrx.py

WoosukKwon

LGTM! Thanks for submitting the PR! Very excited to see what people will build on top of DBRX.

RonanKMcGovern · 2024-03-27T22:31:43Z

doesn't this need tiktoken to be installed? I guess that's not pushed yet to the docker image maybe? could the docker image be re-built?

Calvinnncy97 · 2024-04-22T10:04:36Z

Hi guys, is there an example command that can be used to deploy an API server for DBRX-Base with vLLM? I have been trying with CUDA_VISIBLE_DEVICES=1,2,3,4 python -m vllm.entrypoints.openai.api_server --model databricks/dbrx-base --tensor-parallel-size 4 --host localhost --port 12345 --gpu-memory-utilization 0.9 but it seems to be asking for 1EB of memory with error message torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate more than 1EB memory.

RonanKMcGovern · 2024-04-22T14:21:36Z

I had it running with TGI fine and can't remember if I got it working with vLLM, but I did prepare this template: https://runpod.io/console/deploy?template=bi8ao1ztys&ref=jmfkcdio , which uses: ``` --model databricks/dbrx-instruct --max-model-len 4096 --port 8000 --trust-remote-code ``` That does strike me that it's missing gpus all.... If you're hitting memory issues, maybe consider trying --dtype of half - to see if that forces bfloat16 or float16? BTW, for fast memory download, the latest vLLM docker image (and indeed requirements.txt in the main repo) include hf_transfer, so if you set "HF_HUB_ENABLE_HF_TRANSFER" you'll get much faster download via rust.

…

On Mon, Apr 22, 2024 at 11:04 AM Calvinn Ng ***@***.***> wrote: Hi guys, is there an example command that can be used to deploy an API server for DBRX-Base with vLLM? I have been trying with CUDA_VISIBLE_DEVICES=1,2,3,4 python -m vllm.entrypoints.openai.api_server --model databricks/dbrx-base --tensor-parallel-size 4 --host localhost --port 12345 --gpu-memory-utilization 0.9 but it seems to be asking for 1EB of memory with error message torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate more than 1EB memory. — Reply to this email directly, view it on GitHub <#3660 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASVG6CTCKWC2YMHLXKICIELY6TOEXAVCNFSM6AAAAABFLBITJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRYHE4TINBUGE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Calvinnncy97 · 2024-04-23T03:11:57Z

I see. Makes sense if you set the --max-model-len to 4096. I was doing 32k context length. On the topic of precision, vLLM automatically uses FP16 for FP16 model, and BF16 for BF16 model if I am not wrong.

Calvinnncy97 · 2024-04-23T04:09:29Z

Hi, update: I found out what the issue is. It is due to this warning:

WARNING 04-23 04:08:51 custom_all_reduce.py:52] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly.

Because custom allreduce is disabled, the model did not manage to load. The solution is to add --disable-custom-all-reduce into the command

RonanKMcGovern · 2024-04-23T08:51:01Z

ok, wow thanks, that's pretty specific

(cherry picked from commit e24336b)

megha95 mentioned this pull request Mar 27, 2024

Support for DBRX #3659

Closed

dskhudia mentioned this pull request Mar 27, 2024

[New Model]: Supporting DBRX from Databricks #3658

Closed

WoosukKwon self-assigned this Mar 27, 2024

WoosukKwon reviewed Mar 27, 2024

View reviewed changes

WoosukKwon linked an issue Mar 27, 2024 that may be closed by this pull request

[New Model]: Supporting DBRX from Databricks #3658

Closed

WoosukKwon removed their assignment Mar 27, 2024

WoosukKwon added the action-required label Mar 27, 2024

simon-mo added the release-blocker This PR/issue blocks the next release, therefore deserves highest priority label Mar 27, 2024

megha95 added 4 commits March 27, 2024 17:23

dbrx init

4ee5e0e

update header comment

d7a8f7d

kv cache type update

fc1fc91

format fix

6a27282

megha95 force-pushed the dbrx-support-2 branch from c9b1f63 to 6a27282 Compare March 27, 2024 18:47

format fix

dd90cec

megha95 requested a review from WoosukKwon March 27, 2024 18:55

WoosukKwon self-assigned this Mar 27, 2024

WoosukKwon added 4 commits March 27, 2024 19:21

Merge branch 'main' into dbrx-support-2

ec5461e

Add to supported model docs

32632e2

yapf

1b99aaa

Disable yapf and ruff for dbrx_config.py

97b922e

WoosukKwon removed the action-required label Mar 27, 2024

WoosukKwon approved these changes Mar 27, 2024

View reviewed changes

WoosukKwon enabled auto-merge (squash) March 27, 2024 19:48

WoosukKwon changed the title ~~Support for DBRX~~ [Model] Add support for DBRX Mar 27, 2024

WoosukKwon disabled auto-merge March 27, 2024 20:01

WoosukKwon merged commit e24336b into vllm-project:main Mar 27, 2024
29 of 33 checks passed

charlesfrye mentioned this pull request Mar 29, 2024

Is there a way to get dbrx-instruct to work? modal-labs/modal-examples#678

Closed

xjpang pushed a commit to xjpang/vllm that referenced this pull request Mar 31, 2024

[Model] Add support for DBRX (vllm-project#3660)

93605e3

xjpang pushed a commit to xjpang/vllm that referenced this pull request Apr 19, 2024

[Model] Add support for DBRX (vllm-project#3660)

bd856d0

sighingnow pushed a commit to sighingnow/vllm that referenced this pull request Apr 28, 2024

[Model] Add support for DBRX (vllm-project#3660)

92b1eb6

(cherry picked from commit e24336b)

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Add support for DBRX #3660

[Model] Add support for DBRX #3660

megha95 commented Mar 27, 2024 •

edited

WoosukKwon left a comment

WoosukKwon left a comment

RonanKMcGovern commented Mar 27, 2024

Calvinnncy97 commented Apr 22, 2024

RonanKMcGovern commented Apr 22, 2024 via email

Calvinnncy97 commented Apr 23, 2024

Calvinnncy97 commented Apr 23, 2024 •

edited

RonanKMcGovern commented Apr 23, 2024

[Model] Add support for DBRX #3660

[Model] Add support for DBRX #3660

Conversation

megha95 commented Mar 27, 2024 • edited

WoosukKwon left a comment

Choose a reason for hiding this comment

WoosukKwon left a comment

Choose a reason for hiding this comment

RonanKMcGovern commented Mar 27, 2024

Calvinnncy97 commented Apr 22, 2024

RonanKMcGovern commented Apr 22, 2024 via email

Calvinnncy97 commented Apr 23, 2024

Calvinnncy97 commented Apr 23, 2024 • edited

RonanKMcGovern commented Apr 23, 2024

megha95 commented Mar 27, 2024 •

edited

Calvinnncy97 commented Apr 23, 2024 •

edited