Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parallel tasks #64

Open
sjthespian opened this issue Jan 19, 2022 · 2 comments
Open

Support parallel tasks #64

sjthespian opened this issue Jan 19, 2022 · 2 comments

Comments

@sjthespian
Copy link
Contributor

I'm working on a solution to synchronize images to a set of remote registries, and having the ability to run tasks in parallel would be a huge help. Right now I'm building n different configs and running m instances of dregsy. It would be much cleaner to have a single config will all of my tasks defined and have dregsy handle managing the parallelism.

Something along the lines of:

relay: skopeo

skopeo:
  binary: skopeo
  mode: copy
  # Number of tasks to run in parallel
  parallel_tasks: 4
@xelalexv
Copy link
Owner

Running tasks in parallel can be advantageous when the system running dregsy has a significantly faster network connection than any of the involved source and/or target registries. In the opposite case, we may not gain much of a speed up, since the parallel tasks would compete for the slow network connection. The same may be observed if there's just one slow source and one slow target. At any rate, having the option to add parallelism to tasks is definitely a good idea.

Implementation thoughts:

  • There are a number of global entities, such as authentication tokens and lister caches. Access to those needs to be properly locked when we introduce parallelism, so that for example identical auth refreshes are not done in parallel.
  • It may be necessary to validate tasks to make sure there are no duplicates or target overlaps.
  • ...

@xelalexv xelalexv changed the title Feature request: run parallel tasks Support parallel tasks Jan 19, 2022
@sjthespian
Copy link
Contributor Author

That would be exactly why I'm doing it -- I'm syncing to registries that are only available via. satellite links. My uplink bandwidth is roughly 10x the bandwidth of each individual registry.

I have a poor-mans working implementation of this in meantime, I'm running this in k8s so I just spin up n pods each with a single task. We don't add new registries often, so I'm not creating too much tech debt.

Validation could be tricky. While it isn't in my use case, I could see someone wanting to have parallel syncs running to the same registry with each sync being it a separate namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants