Skip to content

Latest commit

 

History

History

omniparser

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Screen Parsing with OmniParser-v2.0 and OpenVINO

Recent breakthrough in Visual Language Processing and Large Language models made significant strides in understanding and interacting with the world through text and images. However, accurately parsing and understanding complex graphical user interfaces (GUIs) remains a significant challenge. OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements. This enables more accurate and efficient interaction with GUIs, empowering AI agents to perform tasks across various platforms and applications.

More details about model can be found in Microsoft blog post, paper, original repo, OmniParser V2 Blog Post and model card.

In this tutorial we consider how to run OmniParser using OpenVINO.

Notebook contents

The tutorial consists from following steps:

  • Install requirements
  • Convert model
  • Run OpenVINO model inference
  • Launch Interactive demo

In this demonstration, you'll try to run OmniParser for recognition of UI elements on screenshots.

Installation instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.