Loading the model on multiple GPUs #46

aamir-gmail · 2023-04-19T08:24:57Z

I have two 4090 24GB, if possible please provide an extra argument to demo.py to either load the model on CPU or 2 or more GPU and another argument to run on 16-bit and take advantage of extra GPU RAM, instead of editing config files.

CyberTimon · 2023-04-21T13:07:22Z

I also would like to know how to do this?
I have 2x3060 12gb so I could load the 13b model but it doesn't seem to be implemented

taomanwai · 2023-04-27T08:29:59Z

I have same request.

wJc-cn · 2023-05-06T00:50:26Z

I have same request too.

thcheung · 2023-06-06T15:42:41Z

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')

It can run on two RTX 2080Ti in my computer.

sinsauzero · 2023-06-07T06:02:34Z

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')
It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

thcheung · 2023-06-07T06:15:11Z

Set the parameter device_map='auto' when load the LlamaForCausalLM.from_pretrained()
Replace the line in demo.py as: chat = Chat(model, vis_processor, device='cuda')
It can run on two RTX 2080Ti in my computer.

It seems the model is implemented in two devices. But when doing the inference, the tensor flowed in two deivces and it will throw the two devices error. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

(1) Load the LLaMA with device map to 'auto':

MiniGPT-4/minigpt4/models/mini_gpt4.py

Line 94 in 22d8888

device_map={'': device_8bit}

device_map = 'auto'

(2) Modify the line below from 'cuda:{}'.format(args.gpu_id)' to 'cuda', It will automatically assign to device0 or device1 if you have two devices:

MiniGPT-4/demo.py

Line 64 in 22d8888

chat = Chat(model, vis_processor, device='cuda:{}'.format(args.gpu_id))

chat = Chat(model, vis_processor, device='cuda' )

(3) The "to device" can be removed from the line below because llama has been loaded to GPUs automatically:

MiniGPT-4/demo.py

Line 60 in 22d8888

model = model_cls.from_config(model_config).to('cuda:{}'.format(args.gpu_id))

model = model_cls.from_config(model_config)

(4) When encode the image, we may encode the image with CPU and assign the image embedding to GPU

MiniGPT-4/minigpt4/conversation/conversation.py

Line 185 in 22d8888

image_emb, _ = self.model.encode_img(image)

MiniGPT-4/minigpt4/conversation/conversation.py

Line 186 in 22d8888

img_list.append(image_emb)

image_emb, _ = self.model.encode_img(image.to('cpu'))
img_list.append(image_emb.to('cuda'))

The model should now work if you have multiple GPUs with low memory space.

JainitBITW · 2023-08-08T14:12:13Z

Traceback (most recent call last):
File "/home2/jainit/MiniGPT-4/demo.py", line 61, in
model = model_cls.from_config(model_config)
File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 243, in from_config
model = cls(
File "/home2/jainit/MiniGPT-4/minigpt4/models/mini_gpt4.py", line 90, in init
self.llama_model = LlamaForCausalLM.from_pretrained(
File "/home2/jainit/torchy/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2722, in from_pretrained
max_memory = get_balanced_memory(
File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 731, in get_balanced_memory
max_memory = get_max_memory(max_memory)
File "/home2/jainit/torchy/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 624, in get_max_memory
_ = torch.tensor([0], device=i)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I did all of these steps but i still get

sushilkhadkaanon · 2023-09-19T09:58:40Z

@JainitBITW Is it working now for you?

JainitBITW · 2023-09-19T10:09:34Z

Yes i just restarted my cuda.

sushilkhadkaanon · 2023-09-19T10:19:34Z

@JainitBITW Did you do anything apart from @thcheung 's instruction?
Thanks anyway!

JainitBITW · 2023-09-19T10:20:19Z

Nope exactly same

JainitBITW · 2023-09-19T10:20:48Z

What error you are getting

sushilkhadkaanon · 2023-09-19T10:23:47Z

I'm trying to run the 13 B model on multiple GPUs. The author has written they currently don't support multi-GPU inference. So , I want to be sure that it's possible to do inference on multiple GPUs before provisioning the ec2 instance.

JainitBITW · 2023-09-19T10:25:07Z

I think you van go ahead

sushilkhadkaanon · 2023-09-19T12:15:51Z

@JainitBITW @thcheung thanks it worked for me (8 bit). Have any idea how to do it for 16 bit (low resource = False) ?
It is throwing this error:
RuntimeError: Input type (float) and bias type (c10::Half) should be the same

daniellandau · 2023-10-16T17:08:04Z

RuntimeError: Input type (float) and bias type (c10::Half) should be the same

I got through this error by setting vit_precision: "fp32" in minigpt_v2.yaml, but I didn't figure out what would need to be done to get the new input to also be fp16 (half precision) instead of making everything fp32.

uiyo · 2023-11-13T09:46:53Z

My solution is:
CUDA_VISIBLE_DEVICES=1 python demo_v2.py --cfg-path eval_configs/minigptv2_eval.yaml --gpu-id 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading the model on multiple GPUs #46

Loading the model on multiple GPUs #46

aamir-gmail commented Apr 19, 2023

CyberTimon commented Apr 21, 2023

taomanwai commented Apr 27, 2023

wJc-cn commented May 6, 2023 •

edited

Loading

thcheung commented Jun 6, 2023

sinsauzero commented Jun 7, 2023 •

edited

Loading

thcheung commented Jun 7, 2023 •

edited

Loading

JainitBITW commented Aug 8, 2023

sushilkhadkaanon commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

sushilkhadkaanon commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

sushilkhadkaanon commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

sushilkhadkaanon commented Sep 19, 2023

daniellandau commented Oct 16, 2023

uiyo commented Nov 13, 2023

Loading the model on multiple GPUs #46

Loading the model on multiple GPUs #46

Comments

aamir-gmail commented Apr 19, 2023

CyberTimon commented Apr 21, 2023

taomanwai commented Apr 27, 2023

wJc-cn commented May 6, 2023 • edited Loading

thcheung commented Jun 6, 2023

sinsauzero commented Jun 7, 2023 • edited Loading

thcheung commented Jun 7, 2023 • edited Loading

JainitBITW commented Aug 8, 2023

sushilkhadkaanon commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

sushilkhadkaanon commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

sushilkhadkaanon commented Sep 19, 2023

JainitBITW commented Sep 19, 2023

sushilkhadkaanon commented Sep 19, 2023

daniellandau commented Oct 16, 2023

uiyo commented Nov 13, 2023

wJc-cn commented May 6, 2023 •

edited

Loading

sinsauzero commented Jun 7, 2023 •

edited

Loading

thcheung commented Jun 7, 2023 •

edited

Loading