This repository contains several example pipeline configurations for the Datahub::Factory application. You could use them as a boilerplate to create your own pipelines
A pipeline is a connection between two system used to exchange records. A pipeline has three functions:
- Fetch data from a source
- Transform the data. This entails manipulating the structure of the data as well as the format in which the data will be delivered to a destination.
- Push transformed data to a destination.
The definition of a pipeline is contained in a configuration file. These configuration files are based on the Config::Simple syntax. A configuration file roughly defines these three 'plugin' types and their associated configuration in distinct blocks:
- Importer. Defines the source of the data and how to fetch it.
- Fixer. Defines the Catmandu::Fix logic that will transform the data
- Exporter. Defines how the destination for the data and how it can be accessed it.
Note that each plugin instance needs to be referenced explicitely in a global plugin block. Each instance gets it's own dedicated plugin definition.
A very minimal configuration would look like this:
[Importer] plugin = OAI [plugin_importer_OAI] endpoint = https://example/oai metadata_prefix = oai_lido handler = My::Handler [Fixer] plugin = Fix [plugin_fixer_Fix] file_name = '/Users/matthiasvandermaesen/Workspace/Datahub-Fixes/msk_oai_adlib.fix' id_path = 'lidoRecID.0._' [Exporter] plugin = Datahub [plugin_exporter_Datahub] datahub_url = http://datahub.box datahub_format = LIDO oauth_client_id = slightlylesssecretpublicid oauth_client_secret = supersecretsecretphrase oauth_username = admin oauth_password = datahub
The pipeline also accept Fixer "conditions". Since a collection can contain multiple sets of records that require different transformation processing, "conditions" allow you to determine which Fix will be applied to a given record based on a conditional statement.
A condition would look like this:
[Fixer] plugin = Fix [plugin_fixer_Fix] condition = "_metadata.institution\\.name.value" fixers = Foo, Bar id_path = 'lidoRecID.0._' [plugin_fixer_Foo] condition = 'Museum of Foo' file_name = '/home/foobar/fixes/foo.fix' [plugin_fixer_Bar] condition = 'Museum of Bar' file_name = '/home/foobar/fixes/bar.fix'
Datahub::Factory is an application based on Catmandu. As such, pipeline configurations are based on Catmandu concepts and terminology.
Copyright 2016 - PACKED vzw, Vlaamse Kunstcollectie vzw
This library is free software; you can redistribute it and/or modify it under the terms of the GPLv3.