Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#59] Include row level deletion in tombstone count #60

Merged
merged 1 commit into from
Jul 19, 2017
Merged

Conversation

tolbertam
Copy link
Owner

@tolbertam tolbertam commented Jul 18, 2017

Per #59, it appears that row level deletions are not considered in tombstone count. This fixes this issue.

@tolbertam
Copy link
Owner Author

@jwiencek3 would you be willing to give this a try before I merge and make a release? I'm hoping this will fix the issue you reported and just wanted to make sure that was the case before doing a release.

Link to snapshot jar

@jwiencek3
Copy link

testing right now.

an you explain this section of the output: Is the "Count" how many tombstones were removed when a minor compaction occurred?

Estimated tombstone drop times:
Drop Time | Count (%) Histogram
1500430503 (07/19/2017 02:15:03) | 599 ( 0) ▊
1500430736 (07/19/2017 02:18:56) | 2 ( 0)
1500430955 (07/19/2017 02:22:35) | 1743 ( 0) ▉▉▍
1500431183 (07/19/2017 02:26:23) | 17456 ( 1) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▍
1500431545 (07/19/2017 02:32:25) | 17780 ( 1) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉
1500431889 (07/19/2017 02:38:09) | 16219 ( 0) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▊
1500432203 (07/19/2017 02:43:23) | 14919 ( 0) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉
1500432502 (07/19/2017 02:48:22) | 14807 ( 0) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▊
1500432803 (07/19/2017 02:53:23) | 14782 ( 0) ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▊

@tolbertam
Copy link
Owner Author

tolbertam commented Jul 19, 2017

That's correct. That's printing a histogram that is the estimated distribution of tombstones would be dropped by compactions happening after that time.

EDIT: Need to look into that more to verify that, I had always thought that was the case, but i may be wrong.

@tolbertam
Copy link
Owner Author

tolbertam commented Jul 19, 2017

It looks like there was a bug in cassandra (CASSANDRA-12208) where the calculation for estimated droppable tombstones could be incorrect:

To be clear, the "Estimated droppable tombstones" calculation counts tombstones that have not yet passed gc_grace_seconds as droppable tombstones, which is unexpected, since such tombstones aren't droppable.

further..

Reason is that we don't really know gc_grace_seconds when calling sstablemetadata since this is stored in the schema. We want to support calling sstablemetadata on a stand-alone sstable - it would be painful if you had to have access to the schema.
And since it is an estimate, I don't think it matters much?

This is why I was confused. It appears the times in the histogram don't account for gc_grace_seconds which is why it looks like the time at which data was deleted, not when it would be cleaned up. The issue was fixed in 3.0.9+ and 3.10+, but i'm not sure how it factors into the histogram of the distribution of data since this was mentioning the "Estimated droppable tombstones" metric. Since this histogram is coming from C* itself it's not something I think we can easily control in sstable-tools. I can look into this later (tonight) as a C* issue itself.

@tolbertam
Copy link
Owner Author

Thinking about it more, we could sort of cheat and add gc_grace_seconds to the timestamps shown in the histogram, then the values would look right. I see that's basically what the patch for that issue does (adds a --gc_grace_seconds option for adding gc_grace_seconds onto the calculation). I think we could take a similar approach as that ticket and allow the user to pass in gc_grace_seconds (since it's not available in the sstable data currently).

@tolbertam tolbertam merged commit 32475fc into master Jul 19, 2017
@tolbertam tolbertam deleted the issue/59 branch July 19, 2017 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants