Using ModularPipelines for Data Transformation #528

rlisnoff-css · 2024-07-15T18:07:36Z

rlisnoff-css
Jul 15, 2024

Hello! I'm looking to see if this project fits what I'm trying to accomplish, it looks really promising but there's one aspect that I wasn't so sure of.

I'm looking to take a set of data and transform it with a pipeline. I want to be able to iterate over each piece of data asynchronously from outside of the pipeline context and send it down a defined pipeline, ultimately sending the fully transformed piece off to a sink somewhere. While reading over the documentation, I didn't see a way to execute a pipeline multiple times, the only example I've come across is defining the pipeline at host building time and have that execute just once. Is there a recommended way of doing that?

Thank you for your time!

Answered by thomhurst

Jul 15, 2024

This library was definitely designed as more of a CI/CD library but really it should be generic enough to perform any sort of pipeline. Really, in essence, it's just an orchestrator for your jobs, handling the concurrency and dependencies on other modules for you.

You define a module for each action you want your code to do, and then tell it if that module relies on any of your other modules so that it will wait for it before starting etc.

The module is an abstract class, so you define whatever code/action you want to perform, and full dependency injection is supported if needed. So your execute method of a module simply does that data transformation and then returns it. What you return i…

View full answer

thomhurst · 2024-07-15T18:19:55Z

thomhurst
Jul 15, 2024
Maintainer

Heya. Not sure I completely understand your use case, but the way things are built currently is that a pipeline is only able to be executed once. If you need to execute it multiple times, you'd need to recreate the pipeline host builder again and execute a new instance of it

7 replies

rlisnoff-css Jul 15, 2024
Author

Thanks for the quick reply! I'm trying to build a handful of jobs that take in files from one of many datasources, and for each file we process the contents of those files in various ways before sending off results to another application. After all files in the filesource are processed, the job is considered complete.

From what I understand, this package is more for defining a pipeline in the CI/CD sense, but I was hoping to use it for data transformation, do you feel like this use case is supported?

thomhurst Jul 15, 2024
Maintainer

This library was definitely designed as more of a CI/CD library but really it should be generic enough to perform any sort of pipeline. Really, in essence, it's just an orchestrator for your jobs, handling the concurrency and dependencies on other modules for you.

You define a module for each action you want your code to do, and then tell it if that module relies on any of your other modules so that it will wait for it before starting etc.

The module is an abstract class, so you define whatever code/action you want to perform, and full dependency injection is supported if needed. So your execute method of a module simply does that data transformation and then returns it. What you return is then available to be retrieved by any other module that wants it. You could then make another module, with a dependency on the first (that's doing the transformation) to send it off somewhere.

Hopefully that makes sense and is helpful!

Answer selected by rlisnoff-css

rlisnoff-css Jul 15, 2024
Author

That definitely helps, I'm going to give this library a shot- much appreciated!

thomhurst Jul 15, 2024
Maintainer

No problem, good luck! This project publishes its nuget packages itself. So while they're more CI/CD based, it might help demonstrate how I've structured my modules and how they can interact with each other:

https://github.com/thomhurst/ModularPipelines/tree/main/src/ModularPipelines.Build/Modules

thomhurst Jul 21, 2024
Maintainer

@rlisnoff-css did you manage what you needed?

rlisnoff-css Aug 6, 2024
Author

@thomhurst Yes, this ended up working fantastically for helping orchestrate our job code! It's helped us break our jobs into much smaller, maintainable pieces and hopefully we continue in this direction in the future.

While I have your attention, is there a way to cut down on logs originating from Modular Pipelines? Eventually this is going to be hosted in the cloud and we'll be pushing logs out to cloudwatch, which has some cost associated with additional logging.
Thanks a ton for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using ModularPipelines for Data Transformation #528

{{title}}

Replies: 1 comment 7 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using ModularPipelines for Data Transformation #528

rlisnoff-css Jul 15, 2024

Replies: 1 comment · 7 replies

thomhurst Jul 15, 2024 Maintainer

rlisnoff-css Jul 15, 2024 Author

thomhurst Jul 15, 2024 Maintainer

rlisnoff-css Jul 15, 2024 Author

thomhurst Jul 15, 2024 Maintainer

thomhurst Jul 21, 2024 Maintainer

rlisnoff-css Aug 6, 2024 Author

rlisnoff-css
Jul 15, 2024

Replies: 1 comment 7 replies

thomhurst
Jul 15, 2024
Maintainer

rlisnoff-css Jul 15, 2024
Author

thomhurst Jul 15, 2024
Maintainer

rlisnoff-css Jul 15, 2024
Author

thomhurst Jul 15, 2024
Maintainer

thomhurst Jul 21, 2024
Maintainer

rlisnoff-css Aug 6, 2024
Author