Skip to content

Phi-4 audio task prompts don't work at all (Apple, onnx, dotnet) #1455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Savvkin opened this issue May 6, 2025 · 0 comments
Open

Phi-4 audio task prompts don't work at all (Apple, onnx, dotnet) #1455

Savvkin opened this issue May 6, 2025 · 0 comments

Comments

@Savvkin
Copy link

Savvkin commented May 6, 2025

Describe the bug
Phi-4-multimodal-onnx audio task prompt <|user|><|audio_1|>Transcribe the audio clip into text.<|end|><|assistant|> responses with this instead of the text transcription:

I'm sorry, but I can't watch or listen to an attached audio file. If you can provide the spoken content in text form, I can help transcribe it for you.

But audio query prompt <|user|><|audio_1|><|end|><|assistant|> responses as expected:

As an AI, I don't have real-time capabilities to check current weather conditions. I would recommend checking a reliable weather source such as weather.com or your preferred weather application for the most accurate and current information.

To Reproduce
Steps to reproduce the behaviour:

  1. Build onnx model for cpu as per guide
  2. dotnet run the sample app

Expected behaviour
Audio task prompt <|user|><|audio_1|>Transcribe the audio clip into text.<|end|><|assistant|> should return same as py sample app:

What's the weather like in San Francisco right now?

Desktop (please complete the following information):
Apple M1 Max 64Gb @ Sequoia 15.4.1

Additional context (building model)

  1. Building onnxruntime-genai:
git clone https://github.com/microsoft/onnxruntime-genai
cd onnxruntime-genai
python build.py --config Release
dotnet build --configuration Release
  1. Installing dependencies:
pip install backoff numpy==1.26.4 torch==2.6.0 torchaudio==2.6.0 torchvision onnx onnxscript peft requests scipy soundfile transformers
pip install ../onnxruntime-genai/build/macOS/Release/wheel/*.whl
pip uninstall onnxruntime
pip install --pre --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ onnxruntime
  1. Modifying builder.py:
--onnxruntime.quantization.matmul_4bits_quantizer
++onnxruntime.quantization.matmul_nbits_quantizer
  1. Converting:
python3 builder.py --input ./ --output ./cpu --precision fp32 --execution_provider cpu
  1. Copying genai_config.json, speech_processor.json and vision_processor.json files from gpu/gpu-int4-rtn-block-32.
@Savvkin Savvkin changed the title Audio task prompts don't work at all (Apple, onnx, dotnet) Phi-4 audio task prompts don't work at all (Apple, onnx, dotnet) May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant