Skip to content

Community contribution: enabling device_map="auto" support for more vision and multimodal models #29786

Closed
@amyeroberts

Description

@amyeroberts

Feature request

Feature Request

transformers models can be easily loaded across multiple devices using device_map="auto". This will automatically allocate weights across available devices e.g. GPUs and offload any weights onto CPU, then disk as necessary. This is useful when doing inference with large models.

To enable this, _no_split_modules has to be defined in the model's pretrained model class e.g. like here for LLaMa. This defines layers which should not be split across devices, and should contain as few layers as possible.

Steps to add

  • Pick a model to work on and open a PR - comment on this issue to say which model you're working on
  • Define _no_split_modules in the PreTrainedModel subclass. Try with _no_split_modules = [] first
  • Enable testing
    • Ensure the following tests are not skipped for the model: test_disk_offload_bin, test_disk_offload_safetensors, test_cpu_offload, test_model_parallelism, test_model_parallel_beam_search
    • Run the tests in a multi-gpu environment pytest tests/models/{MODEL_NAME}/test_modeling_{MODEL_NAME}.py -vv -k "offload or parallelism"

Models

Motivation

Enable a powerful HF feature for all of our vision models

Your contribution

Ping me for review 🤗

Metadata

Metadata

Assignees

No one assigned

    Labels

    Good Second IssueIssues that are more difficult to do than "Good First" issues - give it a try if you want!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions