Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
gradio_helper.py	gradio_helper.py
qwen2-vl.ipynb	qwen2-vl.ipynb

Name

Last commit message

Last commit date

gradio_helper.py

qwen2-vl.ipynb

Visual-language assistant with Qwen2VL and OpenVINO

Qwen2VL is the latest addition to the QwenVL series of multimodal large language models.

Key Enhancements of Qwen2VL:

SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.

Model Architecture Details:

Naive Dynamic Resolution: Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.

Multimodal Rotary Position Embedding (M-ROPE): Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.

More details about model can be found in model card, blog and original repo.

In this tutorial we consider how to convert and optimize Qwen2VL model for creating multimodal chatbot using Optimum Intel. Additionally, we demonstrate how to apply model optimization techniques like weights compression using NNCF

Notebook contents

The tutorial consists from following steps:

Install requirements
Convert and Optimize model
Prepare OpenVINO GenAI Inference Pipeline
Run OpenVINO GenAI model inference
Launch Interactive demo

In this demonstration, you'll create interactive chatbot that can answer questions about provided image's content.

The image bellow illustrates example of input prompt and model answer.

Installation instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

qwen2-vl

qwen2-vl

README.md

Visual-language assistant with Qwen2VL and OpenVINO

Notebook contents

Installation instructions

Files

qwen2-vl

Directory actions

More options

Directory actions

More options

Latest commit

History

qwen2-vl

Folders and files

parent directory

README.md

Visual-language assistant with Qwen2VL and OpenVINO

Notebook contents

Installation instructions