Get out the crane
Construction time again
What is it this time?
We’re laying a pipeline
Pipeline (1983)
putki allows you to specify and run pipelines that process streaming data.
|
|
Pre-alpha software
putki is currently experimental software which may change unexpectedly and without notice. Documentation may lag or describe future plans for the library. Use at your own risk! |
-
leverage data-driven approach to pipeline management, so that pipelines can be configured and modified using composition and core data functions.
-
support complex pipelines with branching, joining, and feedback cycles.
-
allow modifying pipelines while they are running.
-
provide tools for debugging pipelines and tracing the data flow.
-
provide a default runtime using a JVM thread pool, while allowing extension to other runtimes.
-
provide alternative runtimes for JVM, e.g., based on
core.async. -
define and run pipelines on NodeJS using ClojureScript.
-
provide alternative schedulers for orchestrating jobs.
-
performance tests for different configurations.
-
provide implementations for configuring pipelines with automatic retry and crash recovery, e.g., provide pipe implementation that persists intermediate data.
Pipelines can be specified in hiccup-like syntax using nested vectors.
Call putki.core/start! to start the pipeline, and emit a value with
putki.core/emit to feed it through the pipeline.
(def graph [inc
[#(/ % 3)
[#(Math/ceil %)
[println]]]])
(def pipeline (putki.core/start! graph))
(putki.core/emit pipeline 7)3.0
Internally, putki.core/start! walks through the pipeline graph and
transforms it into a workflow map. The map is fed into a workflow runner.
The same actions can be split into explicit, separate, steps as below.
(def graph [inc ; (1)
[#(/ % 3) ; (2)
[#(Math/ceil %)
[println]]]])
(def workflow (putki.core/init graph)) ; (3)
(def pipeline (putki.core/run! ; (4)
(putki.core/local-thread-runner) ; (5)
workflow))-
A pipeline graph can be defined with a nested vector.
-
Any function taking a single argument can be directly used as a job.
-
putki.core/initturns the graph into a map specifying jobs and their connections. -
putki.core/run!takes a runner and starts the workflow according to the map. -
The default workflow runner uses a local thread-pool to run and orchestrate jobs.
|
|
There is no stopping
Stopping and resetting a running pipeline is currently not possible. |
There shall be examples of how to use putki in common and educational scenarios.
TODO
-
❏ Streaming JSON document
-
❏ Implementing an ARMA filter
-
❏ Processing nested data with a feedback loop
-
❏ Simple Web server
-
❏ Managing a pipeline as Integrant component
-
❏ Using Integrant components as part of a pipeline
-
Streamz — Python data streaming
-
manifold — Clojure lib for asynchronous programming with a stream abstraction
-
Datasplash — Clojure API for Google Cloud Dataflow
-
Apache Spark Streaming — high-throughput distributed stream processing
putki is licensed under the Eclipse Public License v2.0.