Skip to content

Bounding boxes #138

Open
Open
@abrichr

Description

@abrichr

Search before asking

  • I have searched the Multimodal Maestro issues and found no similar feature requests.

Description

As far as I know, Qwen2.5-VL is the first open source multimodal model that can extract bounding boxes.

e.g. from https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb:

Image

It would be great to support this so that other models can support this as well.

Use case

We would use this for generative process automation in https://github.com/OpenAdaptAI/OpenAdapt

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmodelRequest to add / extend support for the model.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions