A set of example pipelines for Datahub-Factory
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



This repository contains several example pipeline configurations for the Datahub::Factory application. You could use them as a boilerplate to create your own pipelines


A pipeline is a connection between two system used to exchange records. A pipeline has three functions:

  • Fetch data from a source
  • Transform the data. This entails manipulating the structure of the data as well as the format in which the data will be delivered to a destination.
  • Push transformed data to a destination.

Pipeline configuration

The definition of a pipeline is contained in a configuration file. These configuration files are based on the Config::Simple syntax. A configuration file roughly defines these three 'plugin' types and their associated configuration in distinct blocks:

  • Importer. Defines the source of the data and how to fetch it.
  • Fixer. Defines the Catmandu::Fix logic that will transform the data
  • Exporter. Defines how the destination for the data and how it can be accessed it.

Note that each plugin instance needs to be referenced explicitely in a global plugin block. Each instance gets it's own dedicated plugin definition.

A very minimal configuration would look like this:

plugin = OAI

endpoint = https://example/oai
metadata_prefix = oai_lido
handler = My::Handler

plugin = Fix

file_name = '/Users/matthiasvandermaesen/Workspace/Datahub-Fixes/msk_oai_adlib.fix'
id_path = 'lidoRecID.0._'

plugin = Datahub

datahub_url = http://datahub.box
datahub_format = LIDO
oauth_client_id = slightlylesssecretpublicid
oauth_client_secret = supersecretsecretphrase
oauth_username = admin
oauth_password = datahub


The pipeline also accept Fixer "conditions". Since a collection can contain multiple sets of records that require different transformation processing, "conditions" allow you to determine which Fix will be applied to a given record based on a conditional statement.

A condition would look like this:

plugin = Fix

condition = "_metadata.institution\\.name.value"
fixers = Foo, Bar
id_path = 'lidoRecID.0._'

condition = 'Museum of Foo'
file_name = '/home/foobar/fixes/foo.fix'

condition = 'Museum of Bar'
file_name = '/home/foobar/fixes/bar.fix'


Datahub::Factory is an application based on Catmandu. As such, pipeline configurations are based on Catmandu concepts and terminology.


Pieter De Praetere pieter@packed.be Matthias Vandermaesen matthias.vandermaesen@vlaamsekunstcollectie.be


Copyright 2016 - PACKED vzw, Vlaamse Kunstcollectie vzw


This library is free software; you can redistribute it and/or modify it under the terms of the GPLv3.