Skip to content

Files

Latest commit

 

History

History

qwen2-audio

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Audio-language assistant with Qwen2Audio and OpenVINO

Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. Model supports more than 8 languages and dialects, e.g., Chinese, English, Cantonese, French, Italian, Spanish, German, and Japanese and can work in two distinct audio interaction modes:

  • voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input;
  • audio analysis: users could provide audio and text instructions for analysis during the interaction;

More details about model can be found in model card, blog, original repository and technical report.

In this tutorial we consider how to convert and optimize Qwen2Audio model for creating multimodal chatbot. Additionally, we demonstrate how to apply stateful transformation on LLM part and model optimization techniques like weights compression using NNCF

Notebook contents

The tutorial consists from following steps:

  • Install requirements
  • Convert and Optimize model
  • Run OpenVINO model inference
  • Launch Interactive demo

In this demonstration, you'll create interactive chatbot that accepts instructions by voice and answer questions about provided audio's content.

Installation instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.