This repository contains several example pipeline configurations for the Datahub::Factory application. You could use them as a boilerplate to create your own pipelines
A pipeline is a connection between two system used to exchange records. A pipeline has three functions:
- Fetch data from a source
- Transform the data. This entails manipulating the structure of the data as well as the format in which the data will be delivered to a destination.
- Push transformed data to a destination.
The definition of a pipeline is contained in a configuration file. These configuration files are based on the Config::Simple syntax. A configuration file roughly defines these three 'plugin' types and their associated configuration in distinct blocks:
- Importer. Defines the source of the data and how to fetch it.
- Fixer. Defines the Catmandu::Fix logic that will transform the data
- Exporter. Defines how the destination for the data and how it can be accessed it.
Note that each plugin instance needs to be referenced explicitely in a global plugin block. Each instance gets it's own dedicated plugin definition.
A very minimal configuration would look like this:
[General] id_path = 'id' [Importer] plugin = JSON [plugin_importer_JSON] file_name = './data/bar.json' [Fixer] plugin = Fix [plugin_fixer_Fix] file_name = './fixes/empty.fix' [Exporter] plugin = YAML [plugin_exporter_YAML]
The pipeline also accept Fixer "conditions". Since a collection can contain multiple sets of records that require different transformation processing, "conditions" allow you to determine which Fix will be applied to a given record based on a conditional statement.
A condition would look like this:
[Fixer] plugin = Fix [plugin_fixer_Fix] condition = "_metadata.institution\\.name.value" fixers = Foo, Bar [plugin_fixer_Foo] condition_path = 'Museum of Foo' file_name = '/home/foobar/fixes/foo.fix' [plugin_fixer_Bar] condition_path = 'Museum of Bar' file_name = '/home/foobar/fixes/bar.fix'
Datahub::Factory is an application based on Catmandu. As such, pipeline configurations are based on Catmandu concepts and terminology.
- Pieter De Praetere firstname.lastname@example.org
- Matthias Vandermaesen email@example.com
Copyright 2016, 2019 - PACKED vzw, Vlaamse Kunstcollectie vzw
This library is free software; you can redistribute it and/or modify it under the terms of the GPLv3.