Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTL] Historical data may never get the opportunity to get garbage collected #5438

Closed
luyade opened this issue Mar 25, 2023 · 5 comments
Closed
Labels
type/enhancement Type: make the code neat or more efficient

Comments

@luyade
Copy link
Contributor

luyade commented Mar 25, 2023

Please check the FAQ documentation before raising an issue

Describe the bug (required)

By default, custom_filter_interval_secs is set as 24 * 3600, which means in 24 hours, we will only have one chance to do custom minor compaction. Otherwise, it will go to the default minor compaction.

企业微信截图_107a2bbb-a440-4aa5-9634-7def95070245

For historical data, many data will reside in the bottommost level. For the expired data in the bottommost level, it doesn't have many chance to get GC except periodic compaction. However, during daily running, the only custom-compaction chance will most possibly be used by upper level data compaction, such as level0 => level1. So the default 30-days periodic compaction will go through the default minor compaction, without go through custom compaction filter. So the expired data will always be there.

Here is some log I print out in StorageIterator.h:

企业微信截图_12cc53a8-4ba2-473a-8f10-f4438c027c9d

As you can see, it will read a bunch of expired edges during edge traverse.

After I fixed the compaction logic, the performance got extremely better.

企业微信截图_13ab7c92-681c-4d23-aa92-e242a316a553

Your Environments (required)

  • OS: uname -a
  • Compiler: g++ --version or clang++ --version
  • CPU: lscpu
  • Commit id (e.g. a3ffc7d8)

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Step 1
  2. Step 2
  3. Step 3

Expected behavior

Additional context

@luyade luyade added the type/bug Type: something is unexpected label Mar 25, 2023
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Mar 25, 2023
@pengweisong
Copy link
Contributor

In the design now, it indeed depends on user to trigger the compact job to GC the garbage in the bottom level.

@luyade
Copy link
Contributor Author

luyade commented Mar 28, 2023

In the design now, it indeed depends on user to trigger the compact job to GC the garbage in the bottom level.

As far as I can see, in the current design, the only way user can trigger the compaction of the bottommost level is to submit a compact job, which will then trigger a full compaction. Actually, full compaction is almost unacceptable in production environment.

@pengweisong
Copy link
Contributor

If you have any idea to improve this, any contribution are welcomed.

@luyade
Copy link
Contributor Author

luyade commented Mar 28, 2023

If you have any idea to improve this, any contribution are welcomed.

My idea is simple. see #5447

@xtcyclist
Copy link
Contributor

Not a bug, this is. Removed the bug label.

@xtcyclist xtcyclist added type/enhancement Type: make the code neat or more efficient and removed type/bug Type: something is unexpected severity/none Severity of bug affects/none PR/issue: this bug affects none version. labels Mar 28, 2023
@luyade luyade closed this as completed Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Type: make the code neat or more efficient
Projects
None yet
Development

No branches or pull requests

3 participants