No `device_map` option. 

Currently there is no way to use large models hence there is no support for 8-bit quantization and more importantly there is no support for device mapping. 

As you can see first GPU is filled but second GPU is left unallocated.
![image](https://github.com/vllm-project/vllm/assets/47108366/a3e7cc39-921f-44ac-b0fb-bb6f73e29cda)

Here is the error message:
`OutOfMemoryError: CUDA out of memory. Tried to allocate 270.00 MiB (GPU 0; 23.70 GiB total capacity; 22.40 GiB already allocated; 247.50 MiB free; 22.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

No `device_map` option. #196

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

No device_map option. #196

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

No `device_map` option. #196