Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 #179

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

AmazingK2k3
Copy link

@AmazingK2k3 AmazingK2k3 commented Mar 1, 2025

Description

As discussed in this issue https://github.com/roboflow/maestro/issues/176, this PR implements the device map feature for loading all 3 models. No change in dependencies is required.

The 'device' hyperparameter was replaced by 'device map' to maintain consistency with huggingface and avoid confusion. It was also ensured in the Florence 2 model that the device map does not take in a dict input, eg: {"": "cuda:0"} and 'auto' directly assigns the device to an available device based on the already existing parse_device_spec() function.

For Qwen 2.5 and PaliGemma 2, the device map is directly passed to the loading of the models (from_pretrained), with the default set to 'auto'.

The docstring for the load_model() function for all 3 model checkpoints was updated to reflect the changes.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Testing

Tested loading each model setting device map to different modes - 'auto', 'cuda', 'cpu'. In a cloud environment passing the cases.

I have read the CLA Document and I sign the CLA.

@CLAassistant
Copy link

CLAassistant commented Mar 1, 2025

CLA assistant check
All committers have signed the CLA.

@AmazingK2k3 AmazingK2k3 changed the title Commit - Device map feature for maestro models -qwen_2.5, florence_2 … Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2 Mar 3, 2025
@SkalskiP
Copy link
Collaborator

SkalskiP commented Mar 6, 2025

Hi @AmazingK2k3 👋🏻 thank you so much for your PR. Could you please explain why you decided to drop the device argument? I'm looking at the #176 issue and if I remember correctly, we wanted to keep the device argument and add device_map allowing for:

  • Load on CPU
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cpu"
)
  • Load on MPS
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="mps"
)
  • Load on single GPU machine
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cuda:0"
)
  • Load model on all GPUs
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)
  • Load model on specific subset of GPUs
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map={"": "cuda:0"}
)

I think just device_map won't allow us for the same level of flexibility.

@AmazingK2k3
Copy link
Author

AmazingK2k3 commented Mar 6, 2025

Hey @SkalskiP, The main reason I dropped the device argument completely is that I felt having two arguments that dealt with handling devices device and device_map might confuse the user loading the model. For example, currently, if we have both device and device map, say the user is loading the qwen model, the user could set the device = cpu but leave the device_map setting it to None or auto,

  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_id_or_path,
            revision=revision,
            trust_remote_code=True,
            device_map=device_map if device_map else "auto",
            torch_dtype=torch.bfloat16,
            cache_dir=cache_dir,
        )
        model.to(device)

This will ultimately load the model across GPUs even if a specific device is requested, as stated in issue #176.

I felt it would be much simpler to have a single argument dealing with the devices. device_map is commonly used in the transformers library as well and can directly take in cpu, mps, and cuda:0 that device can take in and load the models accordingly. If It is left None, the models will be loaded with the device_map set as auto.

  • Load on CPU
processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map="cpu"
)
  • Load on MPS
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
     device_map="mps"
 )
  • Load on single GPU machine
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="cuda:0"
)
  • Load model on all GPUs
 processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)
  • Load model on a specific subset of GPUs
processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map={"": "cuda:0"}  # Not applicable to Florence 2
)

Just one argument device_map for all cases!

Let me know if this is okay or there is a better way to go about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants