Using ModularPipelines for Data Transformation #528
-
Hello! I'm looking to see if this project fits what I'm trying to accomplish, it looks really promising but there's one aspect that I wasn't so sure of. I'm looking to take a set of data and transform it with a pipeline. I want to be able to iterate over each piece of data asynchronously from outside of the pipeline context and send it down a defined pipeline, ultimately sending the fully transformed piece off to a sink somewhere. While reading over the documentation, I didn't see a way to execute a pipeline multiple times, the only example I've come across is defining the pipeline at host building time and have that execute just once. Is there a recommended way of doing that? Thank you for your time! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Heya. Not sure I completely understand your use case, but the way things are built currently is that a pipeline is only able to be executed once. If you need to execute it multiple times, you'd need to recreate the pipeline host builder again and execute a new instance of it |
Beta Was this translation helpful? Give feedback.
This library was definitely designed as more of a CI/CD library but really it should be generic enough to perform any sort of pipeline. Really, in essence, it's just an orchestrator for your jobs, handling the concurrency and dependencies on other modules for you.
You define a module for each action you want your code to do, and then tell it if that module relies on any of your other modules so that it will wait for it before starting etc.
The module is an abstract class, so you define whatever code/action you want to perform, and full dependency injection is supported if needed. So your execute method of a module simply does that data transformation and then returns it. What you return i…