Open
Description
Search before asking
- I have searched the Multimodal Maestro issues and found no similar bug report.
Bug
Hello,
I was testing out the zeroshot object detection colab notebook personally in my aws environment and I noticed initially that the qwen model was loading across different gpus using this below code and not just the same code even after setting the cuda device.:
from maestro.trainer.models.qwen_2_5_vl.checkpoints import load_model, OptimizationStrategy
MODEL_ID_OR_PATH = "Qwen/Qwen2.5-VL-7B-Instruct"
MIN_PIXELS = 512 * 28 * 28
MAX_PIXELS = 2048 * 28 * 28
processor, model = load_model(
model_id_or_path= "Qwen/Qwen2.5-VL-7B-Instruct",
device = 'cuda:0',
optimization_strategy=OptimizationStrategy.NONE,
min_pixels=MIN_PIXELS,
max_pixels=MAX_PIXELS,
)
I did browse through the checkpoints file for the qwen model and figured this might be the issue, where regardless of the parameter to the function the model device map is set 'auto'.
// maestro/trainer/models/qwen_2_5_vl/checkpoints.py
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id_or_path,
revision=revision,
trust_remote_code=True,
device_map="auto",
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
cache_dir=cache_dir,
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
else:
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id_or_path,
revision=revision,
trust_remote_code=True,
device_map="auto",
torch_dtype=torch.bfloat16,
cache_dir=cache_dir,
)
model.to(device)
which might override the hyperparameters.
I am confident this is a bug but let me know if this is a issue from my side.
Environment
- maestro[qwen_2_5_vl]==1.1.0rc2"
- aws environment - with 4A10 GPUs
- WIthout import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0", unable to prevent the model loading in all 4 GPUs.
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!