Skip to content

Documentation

Loris Sauter edited this page Jun 26, 2024 · 34 revisions

Here, we document the inner workings of vitrivr-engine, introduce concpets employed and aim on providing a good overview of the components of vitrivr-engine.

Terminology

This chapter introduced common terminology.

Introduction

In content-based multimedia retrieval, the aim is to search within multimedia collections (e.g. video, image, audio, 3d objects) on a content, hence semantic level. This is a non-trivial problem due to the so-called semantic gap - the stark difference of semantic understanding of content between human and machines. Recent developments in foundation models has reduced this, yet, to efficiently search within large collections of multimedia data, various techniques are employed.

Ingestion / Offline Phase

In (multimedia) retrieval, there a common distinction is between two phases; the ingestion phase (also known as offline phase), during which the multimedia content is being analysed and representations of the content is stored in an efficient way for later use.

Retrieval / Online Phase

The retrieval phase (also known as online phase) describes actions performed after ingestion, when (user) queries to the system are analysed in the same manner, as the multimedia data has been and the comparison of query and content is operated on those represntations. The outcome usually is represented by a list of results, each with an accompanying similarity score, which indicates how similar the results are. Commonly, a similarity score of 1 represents identity, while a similarity score of 0 indicates the greates dissimilarity.

Feature

In multimedia retrieval, a feature stands for the means on how to represent the multimedia content.

Toy Example

A very primitive feature is the average colour: Given an image (either an image or a frame from a video), one calculates the average colour by averaging the inidividual pixels' RGB values. While on its own this is not very expressive, demonstrates on how features work.

During ingestion, the average colour is calculated for all the input data (again, this could be for example a bunch of images or a couple of representative frames from a video) and stored in the database as three-element vectors (R,G,B).

During retrieval time, the query consists of a single three-element vector (R,G,B) and a Nearest Neighbour Search (NNS) is performed on those average colour vectors. The distance then is converted to a similarity score s on the interval $$s \in [0,1]$$ for all items in the database.

Further Reading

  • Basics: Wikipedia
  • Research: vitrivr
  • Book: Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval, ACM Press Books, 1999 (1st edition), 2011 (2nd edition)

There are a lot of (research) publications out there which cover (multimedia) retrieval in great detail.

Data Model vitrivr-engine

vitrivr-engine's data model is based on almost a decade of research in multimedia retrieval. Influenced by its predecessor, the retrieval engine Cineast, the aim of the data model is to be as flexible as possible while still providing foundational guidelines for consumer of vitrivr-engine.

Retrievable

In vitrivr-engine, a retrievable is the unit of retrieval and the logical representation of multimedia data. Depending on the type of multimedia, one (e.g. image) or more (e.g. video) retrievables exist.

For an image file, a single retrievable of the type SOURCE:IMAGE is created. For a video file, a single retrievable of the type SOURCE:VIDEO is created and a couple of retrievables with the type SEGMENT are created, depending on the segmentation strategy. Having a 30s video and a 1s fixed length segmentation, 31 retrievables are the result, one per second plus the one for the file. The one-second-segment retrievables have a partOf relationship towards the source retrievable.

Descriptor

The descriptor describes a retrievable in vitrivr-engine. The fundamental concept is, that a retrievable's content is represented by descriptors, which are based on features.

For an image file and the average colour example: The source retrievable is described by one average colour descriptor. For a 30s veideo file and the average colour example: Each of the 30 one-second-segment retrievables are described by one average colour descriptor, the source retrievable is not described.

Overview of Descriptors

In vitrivr-engine, there are four distinct high-level types of descriptors:

  • Vector descriptors have a type (e.g. float) and a length. Ideal for NNS.
  • Struct descriptors have pre-defined sub-fields of various types.
  • Scalar descriptors consist of a single typed value.
  • Tensor descriptors represent a mathematical tensor. Not yet implemented [June, 2024]

Schema

vitrivr-engine operates on the notion of a named schema, similarly to a database or a collection, essentially providing, among other things, a namespace.

{
  "name":"my-schema"
}

Database Connection

Each schema has to have a database connection which describes where the schema is persisted (and read from). The database which is supported by vitrivr-engine is CottontailDB.

{
  "database": "CottontailConnectionProvider",
  "parameters": {
    "Host": "127.0.0.1",
    "port": "1865"
  }
}

Field

In vitrivr-engine, the term field represents features which are to be used. In particular, each field is uniquely named and might be parameterised.

Note: In technical terms, each field has to be backed by an Analyser, whose output is a descriptor. During ingestion, the analyser produces the representing descriptor of a retrievable, during retrieval the analysis step involves the execution of a query using the derived descriptor.

{
  "name": "uniqueName",
  "factory": "FactoryClass",
  "parameters":{
    "key": "value"
  }
}

A note about fields in vitrivr-engine: Due to its highly modular architecture, a handful of features to be used as fields are shipped with vitrivr-engine. The toy example is the AverageColor. Depending on use case, custom features can be added.

See analysier / field overview.

Exporter

In constrast to an analyser / a field, in vitrivr-engine, an exporter produces exports new, derived data.

{
    "name": "uniqueName",
    "factory": "FactoryClass",
    "resolverName": "resolverName",
    "paramters": {
        "key": "value"
    }
}

Resolver

Schema Configuration

Ingestion

Retrieval

Clone this wiki locally