Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for audios in apply_chat_template #36770

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

junnei
Copy link

@junnei junnei commented Mar 17, 2025

What does this PR do?

Add Feature #36769

support for audios in apply_chat_template

  • add support for audios in apply_chat_template
  • add AudioInput audio_utils
  • add load_audio in audio_utils
  • add soundfile available in import_utils
  • fix typo in _process_messages_for_chat_template

Before

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "text", "text": "Follow the instruction in the audio with this image."}
        ]
    }
]

After

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"},
            {"type": "audio", "audio": "https://huggingface.co/microsoft/Phi-4-multimodal-instruct/resolve/main/examples/what_is_shown_in_this_image.wav"},
            {"type": "text", "text": "Follow the instruction in the audio with this image."}
        ]
    }
]

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@Rocketknight1 @eustlb
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

* add support for audios in apply_chat_template
* add AudioInput audio_utils
* add load_audio in audio_utils
* add soundfile available in import_utils
* fix typo in _process_messages_for_chat_template
@github-actions github-actions bot marked this pull request as draft March 17, 2025 17:10
Copy link

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@junnei junnei marked this pull request as ready for review March 17, 2025 17:14
@junnei
Copy link
Author

junnei commented Mar 17, 2025

Note that I noticed the CI check ci/circleci: check_code_quality is failing. This appears to be related to formatting issues in the code. I've tried to maintain consistency with the existing code structure for image processing in import_utils.py, but this approach seems to be causing the automated checks to fail.

If there are specific style guidelines or formatting requirements that I need to follow, I would appreciate your guidance on how to modify the code to pass these quality checks while still maintaining consistency with the existing codebase. I'm happy to make any necessary adjustments to ensure the code meets the project's standards.

@zucchini-nlp
Copy link
Member

Thanks for the PR @junnei !

Adding a chat template for audio models is indeed needed. Unfortunately it required a nit more changes to follow the internal roadmap for standard processor API. I already opened a PR for that a while ago (#34601), which fits betters imo 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants