Files

.binder
.ci
.docker
.github
licensing
notebooks
- 3D-pose-estimation-webcam
- 3D-segmentation-point-clouds
- action-recognition-webcam
- animate-anyone
- async-api
- auto-device
- bark-text-to-audio
- big-transfer-quantization
- blip-visual-language-processing
- catvton
- clip-language-saliency-map
- clip-zero-shot-image-classification
- controlnet-stable-diffusion
- convert-to-openvino
- ct-segmentation-quantize
- ddcolor-image-colorization
- deepseek-r1
- deepseek-vl2
- depth-anything
- detectron2-to-openvino
- distil-whisper-asr
- dynamicrafter-animating-images
- efficient-sam
- encodec-audio-compression
- explainable-ai-1-basic
- explainable-ai-2-deep-dive
- explainable-ai-3-map-interpretation
- fast-segment-anything
- film-slowmo
- florence2
- flux-fill
- flux.1-image-generation
- freevc-voice-conversion
- gemma3
- glm-edge-v
- glm4-v
- got-ocr2
- gpu-device
- grammar-correction
- grounded-segment-anything
- handwritten-ocr
- hello-detection
- hello-npu
- hello-segmentation
- hello-world
- hugging-face-hub
- hunyuan-dit-image-generation
- image-classification-quantization
- image-to-image-genai
- inpainting-genai
- instant-id
- instruct-pix2pix-image-editing
- internvl2
- janus-multimodal-generation
- jax-to-openvino
- jina-clip
- keras-with-openvino-backend
- knowledge-graphs-conve
- kosmos2-multimodal-large-language-model
- language-quantize-bert
- latent-consistency-models-image-generation
- llasa-speech-synthesis
- llava-multimodal-chatbot
- llava-next-multimodal-chatbot
- llm-agent-functioncall
- llm-agent-react
- llm-chatbot
- llm-question-answering
- llm-rag-langchain
- llm-rag-llamaindex
- localai
- ltx-video
- magika-content-type-recognition
- meter-reader
- minicpm-v-multimodal-chatbot
- mllama-3.2
- mms-massively-multilingual-speech
- mobileclip-video-search
- model-server
- modelscope-to-openvino
- multilora-image-generation
- multimodal-rag
- music-generation
- named-entity-recognition
- nano-llava-multimodal-chatbot
- nuextract-structure-extraction
- object-detection-webcam
- omnigen
- omniparser
- oneformer-segmentation
- openvino-api
- openvino-tokenizers
- openvoice
- optical-character-recognition
- optimize-preprocessing
- outetts-text-to-speech
- paddle-ocr-webcam
- paddle-to-openvino
- parler-tts-text-to-speech
- person-counting-webcam
- person-tracking-webcam
- phi-3-vision
- phi-4-multimodal
- photo-maker
- pix2struct-docvqa
- pixart
- pixtral
- pose-estimation-webcam
- prompt-lookup-decoding
- pytorch-post-training-quantization-nncf
- pytorch-quantization-aware-training
- pytorch-quantization-sparsity-aware-training
- pytorch-to-openvino
- qrcode-monster
- quantizing-model-with-accuracy-control
- qwen2-audio
  - README.md
  - gradio_helper.py
  - ov_qwen2_audio_helper.py
  - qwen2-audio.ipynb
- qwen2-vl
- qwen2.5-vl
- riffusion-text-to-music
- rmbg-background-removal
- s3d-mil-nce-text-to-video-retrieval
- sam2-image-segmentation
- sam2-video-segmentation
- sana-image-generation
- sdxl-turbo
- segment-anything
- siglip-zero-shot-image-classification
- sketch-to-image-pix2pix-turbo
- smoldocling
- smolvlm2
- sound-generation-audioldm2
- sparsity-optimization
- speculative-sampling
- speech-recognition-quantization
- speechbrain-emotion-recognition
- stable-audio
- stable-cascade-image-generation
- stable-diffusion-ip-adapter
- stable-diffusion-keras-cv
- stable-diffusion-text-to-image
- stable-diffusion-torchdynamo-backend
- stable-diffusion-v2
- stable-diffusion-v3
- stable-diffusion-xl
- stable-fast-3d
- stable-video-diffusion
- style-transfer-webcam
- surya-line-level-text-detection
- table-question-answering
- tensorflow-classification-to-openvino
- tensorflow-hub
- tensorflow-object-detection-to-openvino
- tensorflow-quantization-aware-training
- text-to-image-genai
- tflite-selfie-segmentation
- tflite-to-openvino
- tiny-sd-image-generation
- torchvision-zoo-to-openvino
- vehicle-detection-and-recognition
- vision-background-removal
- vision-monodepth
- wav2lip
- whisper-asr-genai
- whisper-subtitles-generation
- yolov10-optimization
- yolov11-optimization
- yolov12-optimization
- yolov8-optimization
- yolov9-optimization
- zeroscope-text2video
- README.md
selector
supplementary_materials
utils
.gitignore
CONTRIBUTING.md
Dockerfile
Jenkinsfile
LICENSE
Makefile
README.md
README_cn.md
README_ja.md
SECURITY.md
check_install.py
requirements.txt

qwen2-audio

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
gradio_helper.py		gradio_helper.py
ov_qwen2_audio_helper.py		ov_qwen2_audio_helper.py
qwen2-audio.ipynb		qwen2-audio.ipynb

README.md

Audio-language assistant with Qwen2Audio and OpenVINO

Qwen2-Audio is the new series of Qwen large audio-language models. Qwen2-Audio is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. Model supports more than 8 languages and dialects, e.g., Chinese, English, Cantonese, French, Italian, Spanish, German, and Japanese and can work in two distinct audio interaction modes:

voice chat: users can freely engage in voice interactions with Qwen2-Audio without text input;
audio analysis: users could provide audio and text instructions for analysis during the interaction;

More details about model can be found in model card, blog, original repository and technical report.

In this tutorial we consider how to convert and optimize Qwen2Audio model for creating multimodal chatbot. Additionally, we demonstrate how to apply stateful transformation on LLM part and model optimization techniques like weights compression using NNCF

Notebook contents

The tutorial consists from following steps:

Install requirements
Convert and Optimize model
Run OpenVINO model inference
Launch Interactive demo

In this demonstration, you'll create interactive chatbot that accepts instructions by voice and answer questions about provided audio's content.

Installation instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.