Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the functionality to fetch topic offsets by timestamp #604

Merged
merged 1 commit into from
Aug 15, 2020

Conversation

liukun
Copy link

@liukun liukun commented Dec 19, 2019

Fixed #329, similar to #525 but with different naming and tests.

Supports consumer.subscribe({ topic: 'test-topic', fromTimestamp: 1576731939876 }).

Edited: reverted consumer.subscribe changes and limits the functionality within admin.

src/admin/__tests__/fetchTopicOffsets.spec.js Outdated Show resolved Hide resolved
src/admin/index.js Outdated Show resolved Hide resolved
src/cluster/__tests__/fetchTopicsOffset.spec.js Outdated Show resolved Hide resolved
src/cluster/__tests__/fetchTopicsOffset.spec.js Outdated Show resolved Hide resolved
src/cluster/index.js Outdated Show resolved Hide resolved
src/consumer/index.js Outdated Show resolved Hide resolved
@JaapRood
Copy link
Collaborator

JaapRood commented Dec 19, 2019

Interesting the proposal! Adding fromTimestamp to the admin client is without a doubt a great addition. It's something the Kafka protocol supports and has proven to be important in a lot of ways.

Combining it with consumer.subscribe, however, I'd like to hear a motivation for. Although I've got some ideas as somebody familiar with the implementation of the client, as a user I'm really confused how it's supposed to work.

I think it's obvious that it implies consumption from a message with a certain timestamp, knowing Kafka it would try to resolve the offset of the first message with that timestamp. But the question is: when does it do that? Is it after it 'subscribed', once, at the beginning? But then what about when a rebalance happens? Does it resolve back to that offset then as well, or continues where it left off with assignments it gets re-assigned, or does it only apply to new assignments only, etc.?

Or perhaps it works like a fallback, for when the offset we're trying to consume is out of range, basically like fromBeginning. While that side steps all the behavioural questions above, I wonder how useful that is to define like that. With subscriptions made only once and processes that can run for days, weeks or months, surely that timestamp is a moving target?

Rather the adding fromTimestamp to consumer.subscribe, I think it would make a lot more sense to add it to consumer.seek. A seek stands as a single operation in a specific point in time, and therefore doesn't have any of the behavioural issues as described above. You already seek to an offset, so just seeking to a logical offset ("the first offset after this timestamp") instead of an absolute one totally makes sense.

@liukun
Copy link
Author

liukun commented Dec 20, 2019

@JaapRood In consumer.subscribe, fromTimestamp works the same way as fromBeginning that only affects a new consumer group. If the client's consumer group had subscribed before, it would continue reading from the remembered offsets. If not, it would try to start reading from the specified timestamp.

The use-case is that when the running consumer group gives wrong results since a date, we need to start a new group parallelly, but skip some old data before that date. If this new group works correctly, we can stop the old one. Set a timestamp to begin with, would eliminate some duplicate works.

Edited: It seems my old comments were wrong. Better to just run a script with admin to set the consumer group's offsets first, then start the consumers to continue the real work.

@Nevon
Copy link
Collaborator

Nevon commented Dec 21, 2019

I think it's worth considering how this fits in with #580. We want to deprecate fromBeginning to allow for more complex behavior, so then if we want to add a timestamp-based default offset for subscribe, how would we combine that with fromTimestamp? I would be hesitant to introduce a new parameter now if we anyway are going to deprecate it in a month. I would want to make sure that whatever we introduce now won't require a breaking change later.

I would suggest that we split the discussion into two separate parts. Everyone is in complete agreement on adding this functionality to the admin interface, so let's get that change merged first in the scope of this PR. Then we can discuss if and how to allow for that use case within the consumer. I think that discussion can be had in #580.

Sound good?

@liukun
Copy link
Author

liukun commented Dec 21, 2019

@Nevon Sounds fair, I'll edit the PR later.

@liukun liukun changed the title Can specify timestamp when consumer subscribe a topic Add admin.resetOffsetsByTimestamp to set consumer group's offsets by timestamp Dec 22, 2019
@liukun liukun requested a review from Nevon December 22, 2019 14:03
@liukun liukun force-pushed the dev/topic-offset-timestamp branch 2 times, most recently from 2830638 to 3e46e63 Compare December 22, 2019 14:34
@liukun
Copy link
Author

liukun commented Dec 22, 2019

Hmm, some unrelated tests were not stable.

src/admin/index.js Outdated Show resolved Hide resolved
@Nevon
Copy link
Collaborator

Nevon commented Dec 23, 2019

It's the holiday season over here, so I don't think I'll have the opportunity to properly review this until at least the 27th. Sorry for the inconvenience.

@liukun liukun changed the title Add admin.resetOffsetsByTimestamp to set consumer group's offsets by timestamp Add the functionality to set consumer group's offsets by timestamp Dec 24, 2019
@liukun liukun requested a review from Nevon December 24, 2019 09:13
@liukun
Copy link
Author

liukun commented Dec 25, 2019

As I'm more familiar with Kafka now, this use-case can be achieved by command line script provided by Kafka:

./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group consumer.group.id --topic topic.name --reset-offsets --to-datetime '2019-12-25T00:00:00.000' --execute

@dfilkovi
Copy link
Contributor

Hello, will this be finished as I'm also interested in this functionality?

@ShanonJackson
Copy link

Whats going on guys? Really need this functionality as building graphs dashboard and need to stream in 20minutes historical data + stream now .
@Nevon

@liukun
Copy link
Author

liukun commented Jul 29, 2020

The functionality should be ready to use. Please tell me if anything more needs to change.

docs/Admin.md Outdated Show resolved Hide resolved
src/admin/index.js Outdated Show resolved Hide resolved
@mayank13
Copy link

Hi @liukun, Is this feature available in any pre-release? I installed the latest beta but this feature is not there. Can you please let me know. Thanks.

@liukun
Copy link
Author

liukun commented Aug 10, 2020

Hi @liukun, Is this feature available in any pre-release? I installed the latest beta but this feature is not there. Can you please let me know. Thanks.

No, this is not accepted yet. You can try the commit 1121467 if you like.

@ShanonJackson
Copy link

@liukun Great work hopefully @Nevon can get this approved asap. Caching 2 hrs of data in my application layer atm to avoid crawling from beginning of topic in meantime until this is merged so will be a godsend when its done.

docs/Admin.md Show resolved Hide resolved
docs/Admin.md Show resolved Hide resolved
types/index.d.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@Nevon Nevon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@Nevon Nevon changed the title Add the functionality to set consumer group's offsets by timestamp Add the functionality to fetch topic offsets by timestamp Aug 15, 2020
@Nevon
Copy link
Collaborator

Nevon commented Aug 15, 2020

Cool, let's get this merged. Thanks for the contribution, and sorry it took so many back and forth's.

I see one thing I wanna tighten up in the type definitions, but I can do that separately instead.

@Nevon Nevon merged commit c6cc35d into tulios:master Aug 15, 2020
marfedd pushed a commit to cengler/kafkavue that referenced this pull request May 19, 2021
… sola instancia de kafka. Idea de filtro de mensajes por rango de timestamps usando datepicker segun este pr tulios/kafkajs#604
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add timestamp parameter to fetchTopicOffsets
6 participants