vladdu edited this page Nov 10, 2010 · 5 revisions


See the description of Hartmann pipelines for details about them. In short, they are extensions of the usual Unix pipes that allow building graphs of processing units and thus one can implement a full data-flow environment.

Each pipeline is built from pipe elements (in short: pipes). Pipes are processes and thus the items that are sent between them are Erlang terms (as opposed to bytes, as in regular OS pipes). Pipes provide flow control and crash recovery facilities; these can be configured in various ways. One of the major requirements is that the liveness of the system shall be preserved as much as possible.


Pipeline building

Pipes are assembled by connecting the output of one to the input of another. The API is in the `pipes` module.

In the future, a special syntax might become available, but for the time being normal function calls will be used to specify the connections.

It will be possible to handle pipe segments comprised of several pipes as they were a single pipe, but the current implementation doesn’t allow it yet.


A pipe can have multiple inputs and multiple outputs, each identified by an Erlang term. The name `default` is reserved and may be omitted. Connections
between outputs and inputs are one-to-one.

A pipe is defined by the data transformations it does. This transformation is defined (conceptually) by a function with the following signature:

fun([{InputId, [Data]},...], State) -> 
  {[{OutputId, [Result]},...], NewState}

where the data and result tuples are indexed by the pipe’s input or output number, respectively. The returned `Fun` is the function to be called at the next processing step.

This function is called whenever the pipe does a processing step. If there are any @Result@s, they are sent away on the respective output. The Data and Result lists mean that batching several items may be happen.

A pipe becomes active (i.e. starts processing input) as soon as all outputs are connected.

More information