Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy chunk bytes in TSDB store before sending to client #6203

Merged
merged 2 commits into from Mar 13, 2023

Conversation

fpetkovski
Copy link
Contributor

@fpetkovski fpetkovski commented Mar 10, 2023

During head compaction mmaped memory gets released while gRPC is marshaling bytes from that same memory region. This leads to a fatal segfault and crashes the receiver. The segfault happens when marshaling chunks specifically.

This commit modifies the TSDB store server to copy chunk bytes before sending them to the client. I tried running this for a while and saw no significant increase in memory usage.

Fixes #6196.

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

During head compaction mmaped memory gets released while gRPC is
marshaling bytes from that same memory region. This leads to a fatal
segfault and crashes the receiver. The segfault happens when marshaling
chunks specifically.

This commit modifies the TSDB store server to copy chunk bytes before
sending them to the client. I tried running this for a while and saw
no significant increase in memory usage.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Copy link
Member

@onprem onprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, LGTM! As you noted this shouldn't affect memory much, as a chunk always contains small (and finite) number of bytes and the copy gets allocated in stack so no GC pressure, did I get this right? :)

@fpetkovski
Copy link
Contributor Author

I am not sure if the copy happens on the stack, but from running this in a staging environment I can confirm that there's negligible (if any) memory overhead.

@fpetkovski fpetkovski merged commit 9992c4c into thanos-io:release-0.31 Mar 13, 2023
9 checks passed
@douglascamata
Copy link
Contributor

When will this be moved to main? I think ideally we want fixes like this to go to main first and then cherry-picked into the release branch?

@fpetkovski
Copy link
Contributor Author

I don't think cherry picking back and forth is sustainable. I am releasing 0.31.0 and will merge the release branch to main. This way we will converge all changes including the changelog entries for RC releases.

@douglascamata
Copy link
Contributor

douglascamata commented Mar 23, 2023

@fpetkovski this creates a confusing situation for people using the container image tags: historically speaking, this fix will land in between two "old" commits in the main tree.

Something like the following will happen: from oldest to newest commits, we will have A -> **this fix** -> B. So one might think that pulling the image tag for commit B will include the fix, because from their point of view B was supposedly built after "this fix". But it doesn't. Only the image tag from the merge commit and forwards will include it. 😅 🤔

@fpetkovski
Copy link
Contributor Author

Yeah I see how this can be confusing, I can try a rebase commit next time to see if will create a linear history.

@douglascamata
Copy link
Contributor

@fpetkovski we can also talk to other maintainers to see what they think about it. I personally like the linearity particular because it avoids this confusion with the container image builds. If we didn't have them, I wouldn't mind it at all. Would love to hear what others have to say.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants