Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need an mechanism in awssqssource to control parallelism #1343

Open
mikhno-s opened this issue Mar 1, 2023 · 10 comments
Open

Need an mechanism in awssqssource to control parallelism #1343

mikhno-s opened this issue Mar 1, 2023 · 10 comments
Labels
feature request Request for a new feature

Comments

@mikhno-s
Copy link

mikhno-s commented Mar 1, 2023

We have long-living jobs (10-15 min) in knative-serving that are triggered by messages from sqs.
And I see the problem here:

  • awssqssource reads all messages from sqs and removes them. But I want to have the ability to read one message at once and remove it only after the successful http response code from ksvc. Even when I use a memory broker with buffer=1.

I need to have the ability to control the parallelism of the processing mechanism and I don't need fast process all messages but to consume 1 message, send to ksvc (or broker), wait until finished, and only then remove the message from sqs, repeat.

Is it possible to do it with triggermesh?

I found a lot of issues regarding a problem with such a scheme and looks like people need to have a mechanism for long-living jobs triggered by http.

@odacremolbap
Copy link
Member

Thanks for reaching @mikhno-s

Currently our SQS Source works like this:

  • A number of concurrent receivers are determined by CPU available, this is not configurable at the moment.
  • Each receiver will read a batch from SQS with a maximum of 10 items per batch.
  • For each element in the batch we generate a CloudEvent to the Broker/Sink, and only delete from SQS once we have the ACK

You seem to be asking for serialization control, so that there are a maximum of N items on the fly, being N = 1 i n your case, is it?

Can you add some details on your scenario? It sounds like you have some processor that should only process one element at a time, which is not common to hear in event scenarios.

@mikhno-s
Copy link
Author

mikhno-s commented Mar 1, 2023

Thanks for the quick response.

Scheme is simple:
I have the worker that can process the 1 requests at once and processing takes around 10 minutes.

I have an sqs as an input channel and there are around 1k messages to process.

I want to control exactly how many messages will be in-flight because otherwise I reach the activator's (knative-serving component) timeout if multiple messages will be send to the sink at once.

It's also typical flow for ML-like jobs.

@jmcx jmcx added the feature request Request for a new feature label Mar 1, 2023
@jmcx
Copy link
Contributor

jmcx commented Mar 6, 2023

Hi @mikhno-s thanks for the details on this requirement. Any chance you can jump on a call sometime this week to discuss it with someone from the engineering team (probably Pablo) and myself? I can set up a zoom anytime that suits you. Cheers !

@mikhno-s
Copy link
Author

mikhno-s commented Mar 6, 2023

Sure, what is the preferable way to communicate with you, guys?
Usually, I am free to call since 7AM to 15PM (UTC), but need to discuss the details.

@jmcx
Copy link
Contributor

jmcx commented Mar 6, 2023

Great! We can host a Zoom meeting if that works for you? Some timeslots that could work: Tuesday 2pm UTC, Wednesday 2pm UTC, Thursday 10am or 2pm UTC.

@mikhno-s
Copy link
Author

mikhno-s commented Mar 6, 2023

Tuesday 2pm UTC +

@jmcx
Copy link
Contributor

jmcx commented Mar 6, 2023

Great, is there a particular email I should use? Feel free to contact me at jonathan@triggermesh.com

@jmcx
Copy link
Contributor

jmcx commented Mar 7, 2023

Hi @mikhno-s , I've scheduled a meeting for today and I can just share the Zoom link here in case that works for you: https://triggermesh.zoom.us/j/81980530997?pwd=OWFqMUppc2lxSHRtZUxqcTEvN3lLQT09

Cheers

@jmcx
Copy link
Contributor

jmcx commented Mar 10, 2023

After talking with the team, I logged this issue to add concurrency control on broker triggers: triggermesh/brokers#127. Feedback welcome.

@jmcx jmcx changed the title Need an mechanism in awssqssource to control parallelism Need an mechanism in awssqssource to control parallelism Mar 30, 2023
@shabemdadi
Copy link

Thanks for reaching @mikhno-s

Currently our SQS Source works like this:

  • A number of concurrent receivers are determined by CPU available, this is not configurable at the moment.
  • Each receiver will read a batch from SQS with a maximum of 10 items per batch.
  • For each element in the batch we generate a CloudEvent to the Broker/Sink, and only delete from SQS once we have the ACK

You seem to be asking for serialization control, so that there are a maximum of N items on the fly, being N = 1 i n your case, is it?

Can you add some details on your scenario? It sounds like you have some processor that should only process one element at a time, which is not common to hear in event scenarios.

How does this flow change when using a Kafka broker? More specifically, when do we delete the event from the SQS queue? Is it upon publishing to the kafka topic or upon that record receiving a successful response from the sink?

For context, I am interested in the concurrency control outlined in triggermesh/brokers#127 but saw some activity that maybe using a Kafka Broker could mimic this control in lieu of that feature being implemented. I am trying to think about the failure scenarios using a Kafka Broker with an SQS source (e.g. could there be a situation where the Kafka broker goes down and the messages have been deleted from the SQS queue before being processed by the sink?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

4 participants