Webhook support #25

clook · 2020-09-24T09:45:00Z

Hi @xelalexv, thanks for providing this tool which has a lot of helpers for ECR compared to "raw Skopeo" (mainly update auth tokens and creating a repo if does not exist).

At Padoa, we are using it to keep in sync repositories from 2 distinct cloud provider managed registry instances: Azure Container Registry to AWS ECR.
The best way we found currently is to send a webhook at each manifest push on the origin so that it triggers a sync of the pushed tag. We hence added a support for incoming webhook as well as support for multiple parallel sync.

It is still a draft (we also need to update doc) and work is in progress but tell us if it is something you would like to include in the original dregsy, and how we can contribute in this way if so.

The fork is available here: https://github.com/padoa/dregsy/

Thanks in advance for any answer :)

xelalexv · 2020-09-24T11:14:18Z

Thanks for your interest & for extending dregsy!

So far I've only skimmed the changes in your fork, so these are just a few initial, high level thoughts & comments:

I can see three major areas in which you have made changes:

logging
concurrent syncing
webhooks

I think all of these topics are of interest for inclusion into dregsy. It would be great though if we could do that via dedicated issues + PRs, to have cohesive change sets with separated concerns.

Now for the topics:

Yep, was on my todo list for a long time, and logrus would have been my candidate as well.
When we offer concurrent syncing, we would probably do that by task, meaning that any two tasks can potentially be synced concurrently. We would therefore need to make sure that that cannot lead to any conflict, or leave that to the responsibility of the user defining the tasks?

Another thought: What is your experience with parallelizing pulls/pushes/syncs? How much improvement do you see? For me, it mostly did not show much of an improvement, since a single push/pull would normally already max out the available network bandwidth. But maybe my network connection is to blame here ;-)
If I understand this feature correctly, the use case is to actively trigger a sync task, rather than have the task poll periodically. If that's correct, I'd approach this by extending task, so that in addition to one off and poll, it would have a third mode trigger. A trigger then could have various sources, web hook being one. From what I can see, there is quite some overlap between tasks and web hooks, and we could maybe avoid that with this approach.

A completely different approach would be to define what you want your web hooks to do as one-off tasks, then have an HTTP server run dregsy with the according one-off task each time it receives a request. But that would of course be completely outside of dregsy.

clook · 2020-09-24T21:53:19Z

Thanks for having a look so fast!

You're indeed very right about those 3 domains and we will of course split into 3 PRs for inclusion. We had to iterate fast and that's also why we have duplicated some code parts on purpose: to avoid breaking too much stuff or existing behavior while getting enough flexibility to add features. There will be a refacto effort before shipping code to a PR.

In the environment we are going to host it (currently K8s, maybe managed container service later) and with the SLO we are targeting, we really need to monitor errors and that's why we wanted to make analytics easier with structured logs as json. I already used logrus in several projects and was quite comfortable for using it here. There is place for improvement, mainly what I call the "sync id" to be able to track a sync order (named threadId currently but subject to change): it is updated currently through side effect on a global structure which is not a best-practice and also not concurrent safe.
The primary goal for concurrent sync was to leverage asynchronous tasks as an easy way keep the http server listener available when a synchronization task was launched. A better way to do it would have been to push each order to a channel and to pull out every orders sequentially in a single goroutine.

You will note there are 2 kind of webhook tasks: syncDregsyEvent iterate sequentially over mapping which is a very similar format to the original one but is provided by the hook itself (in json) instead of a static configuration file. syncAzureEvent takes an Azure Container Registry webhook format as input which refers to a single push (no need to iterate).

We did not made any performance test and are not able to say how it can differ or not from a sequential model since running on a AKS cluster makes a single push very fast and difficult to be compared to multiple simultaneous push.

About concurrent use of registry, we did not notice issues with it and I think even writing a exact same layer by 2 different processes is an atomic operation and should not generate an error or unexpected behavior. There is also place for optimization: when we receive 3 pushes of the exact same image with 3 distinct tags at the almost same time, this will result in writing the layers 3 times. We could use a dedicated queue per manifest (using its sha256)

You understood well. That was the primary case for us: in our pipelines, images pushed on ACR are often consumed not so later by ECR and we want to avoid pushing to ECR. With the volume of images, batch synchronization was not an acceptable option and we had to implement a kind of "stream behavior" so that sync is almost all the time accurate. You're completely right for the rationalization and reuse of existing structures. We will also refactor this part to match this idea. For now, everything has been built quickly to be used as fast as possible: we didn't not found any other tool doing what we wanted (and using Skopeo - which seems a right way of handling it) and dregsy seemed the closest one. First objective was to have something very quick (and even dirty) for using it but iterable so that we can easily improve it and refacto some parts that then need it.

I understand your last point. It seems to me difficult to integrate it like another binary, mostly because configuration would have to be (re)written in a file for each hook. Forking the code seemed to us the most natural way to achieve it in the right amount time and with a pretty good reliability.

We will come back to you as soon as we have news and are ready for some reviews. We very recently added retry (because we are having about 1 fail over 1000 and it can easily be reduced with a retry) and are working on bearer token for auth.
I really like the fact that your are open to those changes and challenge them (or the way of doing them) as well.

Thanks!

xelalexv · 2020-10-02T06:58:23Z

Regarding the approach with your current fork, that's no problem at all, I understand your situation. It's important that you get the problem at hand solved quickly. If dregsy could help you a little bit with that, great! As you gain more experience with your implementation, we can spin out the different aspects one by one, discuss, and merge upstream. So we can take this step by step, no pressure. I myself don't have that much bandwidth at the moment to invest in dregsy, it's mostly a side project for me. So taking the slow route is totally fine 😉

I really like the fact that your are open to those changes and challenge them (or the way of doing them) as well.

Likewise, thanks a lot again for taking the time to think about enhancements and implementing them! That's why I like Open Source 👍

xelalexv added the enhancement label Oct 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Webhook support #25

Webhook support #25

clook commented Sep 24, 2020 •

edited

Loading

xelalexv commented Sep 24, 2020 •

edited

Loading

clook commented Sep 24, 2020 •

edited

Loading

xelalexv commented Oct 2, 2020

Webhook support #25

Webhook support #25

Comments

clook commented Sep 24, 2020 • edited Loading

xelalexv commented Sep 24, 2020 • edited Loading

clook commented Sep 24, 2020 • edited Loading

xelalexv commented Oct 2, 2020

clook commented Sep 24, 2020 •

edited

Loading

xelalexv commented Sep 24, 2020 •

edited

Loading

clook commented Sep 24, 2020 •

edited

Loading