Example

This is a full-scale example on how to use vitrivr-engine to index a collection of images and videos and serves as a starting point for advanced users of vitrivr-engine. Previous knowledge about multimedia retrieval and vitrivr-engine are beneficial, however the aim of this example is such that even novices can use vitrivr-engine only with this tutorial.

Goals

This is a tutorial / example on how to use vitrivr-engine for users, e.g. people with a multimedia collection aiming on indexing it. There are three goals of this tutorial:

A quick reference for vitrivr-engine ingestion and retrieval
Thoughts and design choices for schema, ingestion and retrieval
Real-world example, in contrast to other documentation in this wiki, which is more abstract

Why vitrivr-engine

Having a multimedia collection, (videos and images for the sake of this tutorial) is great, however the means to explore / search within (large) collections are still rather limited. With vitrivr-engine, a general purpose content-based multimedia retrieval engine, ingestion (i.e. analysing the content and storing this information for efficient use) and retrieval (i.e. using the previously gathered information to find items of the collection) may improve the understanding / usability of the collection.

Prerequisites

Not a requirement, however reading and following the Getting Started guide is beneficial. Additionally, reading the introduction of the Documentation wiki page is also helpful.

Technical requirements are as follows:

JDK 21 or higher, e.g. OpenJDK
CottontailDB at least v0.16.5
The example collection consisting of CC-0 videos and images. This is arguably a small collection and a real-world multimedia collection would be significantly larger.

Setup

In case no release exists, then building vitrivr-engine is required.

Start CottontailDB on the default port 1865
Build vitrivr-engine (from the root of the repository): Unix:

./gradlew distZip

Windows:

.\gradlew.bat distZip

Unzip the distribution, e.g. unzip -d ../instance/ vitrivr-engine-module-server/build/distribution/vitrivr-engine-server-0.0.1-SNAPSHOT.zip
Prepare the media data into a folder called example/media

By now, you should have the following folder structure:

+ vitrivr-engine/
|
+ instance/
  |
  + vitrivr-engine-server-0.0.1-SNAPSHOT/
    |
    + bin/
    |
    + lib/
+ example/
  |
  + media/
    |
    + images/
    |
    + videos/
    |
    - README.md
|
+ cottontaildb/

The cottontaildb folder is optional and might contain either the DBMS or the repository. We will not delve deeper into the cottontail setup. In the

The Schema

Since we have images and videos with a rather diverse set of styles, we aim on extracting as much content-based information as possible. Therefore, we set the schema accordingly:

The schema fields in detail:

Field	Type	Description
`averagecolor`	Vector, length: 3	The most basic feature for completeness sake
`clip`	Vector, length: 512	CLIP based dense embedding, enables textual, concept search
`file`	Structural	Metadata for the file
`whisper`	Textual	ASR: OpenAI whisper deep learning based subtitle analysis
`ocr`	Textual	OCR: Text recogntion both for images and videos, however for videos only on key frames
`dino`	Vector length 384	DINO based dense embedding, predominantly for query-by-example

The Ingestion

Retrieval

⚠️ This wiki is work-in-progress and targets the dev branch / Release Candiate 1 to be released by the end of August 2024 ⚠️

Found an issue in the wiki? Post it!

Have a question? Ask it

Disclaimer: Please keep in mind, vitrivr and vitrivr-engine are predominantly research prototypes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example

Goals

Why vitrivr-engine

Prerequisites

Setup

The Schema

The Ingestion

Retrieval

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally