Skip to content

query: Fix counter resets for chain deduplication algorithm #8176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lachruzam
Copy link
Contributor

@lachruzam lachruzam commented Mar 26, 2025

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

It's a fix to a not yet released feature.

Changes

Fix counter resets handling for chain deduplication algorithm by not applying resets on series replicas.
The feature issue: #7807

Verification

Added a new e2e test.

Signed-off-by: Michał Mazur <mmazur.box@gmail.com>
@lachruzam
Copy link
Contributor Author

hey @yeya24 , @MichaHoffmann
Could you look at it?
Sorry for tagging directly but I hope to get this fix into 0.38 and just yesterday found the issue on our clusters.
The feature is labelled as experimental, however it's rather useless in the current state.

aggrs []storepb.Aggr
mint, maxt int64
aggrs []storepb.Aggr
deduplicationFunc string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it dedupAlgo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @MichaHoffmann
Thanks for looking at it.

deduplicationFunc is already used in a few places since #7808
Originally, to stay in sync with compactor's --deduplication.func.

At the time I didn't want to propose any bigger refactoring or code duplication for an experimental feature. Still, I don't like how that configuration spread out. Maybe, it can be refactored later on, e.g. with the removal of the experimental label. Till then I'd prefer to stay with the same name in all places, WDYT?

yeya24
yeya24 previously approved these changes Mar 27, 2025
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

saswatamcode
saswatamcode previously approved these changes Mar 27, 2025
@yeya24
Copy link
Contributor

yeya24 commented Mar 27, 2025

How does this work for downsampled counter aggr chunk? Counter aggr chunk needs to use ApplyCounterReset iterator.

@lachruzam
Copy link
Contributor Author

How does this work for downsampled counter aggr chunk? Counter aggr chunk needs to use ApplyCounterReset iterator.

A good question, it does not work in all cases.

Counter aggr chunks come only from stores after the downsampling.
If all chunks come from receivers then there are no problem.
It appears on the intersection of chunks from store and receivers.
Here, I'm assuming that for stores chain (one-to-one) deduplication has been already done by compactor.

So, there may be a need for overlaps removal for chunks coming from both receivers, stores.
In practice, I didn't notice the problem on the clusters, yet.

Let me know, if you seen any error in my assumptions.

Honestly, I missed that part in the first PR🤦. And as 0.38 is closing, I'd like to fix the issue for receivers at least.
As a next step, I plan to build tests for other chunk types but I'd need some time for it.

@MichaHoffmann
Copy link
Contributor

Since this flag is experimental i dont mind if we merge this and followup with fixes later

@yeya24
Copy link
Contributor

yeya24 commented Mar 27, 2025

I think deduplication is not an issue but chunkSeriesIterator cannot deal with aggr counter chunk well as it has some special encoding mechanism.

// Counter aggregation chunks must have the first and last values from their
// original raw series: the first raw value should be the first value encoded
// in the chunk, and the last raw value is encoded by the duplication of the
// previous sample's timestamp.

I wonder how chunkSeriesIterator deals with samples with same timestamp but different values. I don't think there is any special handle logic there. But maybe it is fine to deal with it at the engine level.

@yeya24
Copy link
Contributor

yeya24 commented Mar 27, 2025

If all chunks come from receivers then there are no problem.

If chain deduplication is assumed to only work with receivers but not other types of chunks I am wondering if it is really usable. At least, we should document it so that users are aware of the issue and should only use it to query for receiver only data

@lachruzam
Copy link
Contributor Author

If all chunks come from receivers then there are no problem.

If chain deduplication is assumed to only work with receivers but not other types of chunks I am wondering if it is really usable. At least, we should document it so that users are aware of the issue and should only use it to query for receiver only data

I expect it to work in all cases where are no overlap between raw and counter chunks. I'm working on a test that would verify it. If it's the case then still you're right that it should be documented.

@lachruzam lachruzam dismissed stale reviews from saswatamcode and yeya24 via 869a476 April 7, 2025 09:33
@lachruzam lachruzam marked this pull request as draft April 7, 2025 09:34
@lachruzam
Copy link
Contributor Author

@yeya24 You have been right on counters, adjusting is required even between aggregated chunks.

The last commit contains a new approach - chain dedup on a chunk level.
Here, the code path for penalty and chain dedup is separated in querier. Only raw blocks are chained together while aggregated are randomly dropped. Adjustment for counters is done but on already merged data.

I'm changing this PR to draft as the code is not yet cleaned up, necessary tests are missing and the change has been verified only on synthetic data.

Still, I'd appreciate any comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants