-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow manual offsets commit for a consumer #304
Comments
What is your use case – why are the automatic commits not sufficient? |
I have multiple, for example we batch data from kafka into files which are every defined period of time closed and moved for uploads to s3, only once files are moved for uploads offsets are commited. |
I would recommend trying to align the files with fetched batches and using the batch API, e.g. consumer.each_batch do |batch|
filename = "files/#{batch.topic}-#{batch.partition}-#{batch.first_offset}"
File.open(filename, "w") do |file|
batch.messages.each do |message|
file << message.value
end
end
# Once this block succeeds, the entire batch is marked as consumed.
move_to_s3(filename)
end You can specify |
I can't really align with batch size since we are taking about minutes intervals and during this time |
I'm simply trying to keep non-core features out – maintaining features is a lot more work than adding them. What kind of API are you proposing? |
I would say that at least NOT committing offsets automatically is a core feature. At least Java Kafka client allows you doing that. ensure
# In order to quickly have the consumer group re-balance itself, it's
# important that members explicitly tell Kafka when they're leaving.
@offset_manager.commit_offsets rescue nil
...
end committing offsets and there seem to be no way of overriding that without some dirty hacks. And use case - is consuming __consumer_offsets topic itself. You always want to consume it in full from the very beginning and you do not need to track your progress really so no point in committing at all. Should I raise it as a separate issue? |
@dimas committing offsets is a requirement for the distributed consumer groups to be able to function – otherwise any hiccup in the group would cause partition processing to start over. Do you need to distribute your workload? If not, you can simply use the non-distributed consumer API: kafka.each_message(topic: "__consumer_offsets") do |message|
puts message.offset, message.key, message.value
end |
what if i need to commit offsets only after some processing has happened - a processing of big data chunk, a chunk that cannot fit into a single |
@piavka do you have an API proposal for ruby-kafka for dealing with manual offset commits? |
@dasch simply add |
Hmm. I'll think a bit about it. |
@dasch, oh. You are right, that is probably what I should use. My bad. I think I have kind of weird (or lets say special) use case - trying to do something similar to KafkaOffsetMonitor but in Ruby so my client needs some bits that normal clients should not care about. So I raised another one about that #311, hope you do not mind. |
We have Samza tasks which reads messages from Kafka Output stream but if there is any retryable failure while processing the message then i would want my Samza task to read the same message again and reprocess it. And after successfully processing the message acknowledge it for checkpointing instead of auto commit. Is there a way to manually control the checkpoint(just like what Kafka Consumer provides "Manual Offset Control" by setting enable.auto.commit to false : https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html ) I came across this doc https://samza.apache.org/learn/documentation/0.13/jobs/reprocessing.html which talks about reprocessing previously processed data but it is not offering any acknowledgement based checkpoint control. |
@sidbits I think you're in the wrong repo – this has nothing to do with Samza :) |
Duplicate of #126. |
Currently library consumer offests commit is automatic according to configuration or then
each_batch
block is processed. We need to be able to manually commit offsets but@offset_manager
is not exposed by theConsumer
class nor there is option to disable the automatic offset commits.I can work on PR if such feature would be accepted ?
The text was updated successfully, but these errors were encountered: