Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs][DocDB] Statistics-based full compactions documentation #17667

Merged
merged 4 commits into from Jun 14, 2023

Conversation

jmeehan16
Copy link
Contributor

Adds statistics-based full compaction documentation to the preview section. This includes:

  • Addition of a new section (Statistics-based full compactions to improve read performance) to the yb-tserver page, near the other compaction documentation.
  • Documentation of the following gflags:
    -- auto_compact_check_interval_sec
    -- auto_compact_stat_window_seconds
    -- auto_compact_percent_obsolete
    -- auto_compact_min_obsolete_keys_found
    -- auto_compact_min_wait_between_seconds

Also adds the Scheduled Full Compactions documentation to the preview documentation (it already exists in the stable documentation).

Addresses #15188

@jmeehan16 jmeehan16 requested review from ddorian and ddhodge June 2, 2023 20:07
@netlify
Copy link

netlify bot commented Jun 2, 2023

Deploy Preview for infallible-bardeen-164bc9 ready!

Name Link
🔨 Latest commit c348ba7
🔍 Latest deploy log https://app.netlify.com/sites/infallible-bardeen-164bc9/deploys/648a3bb9e7990700087bf919
😎 Deploy Preview https://deploy-preview-17667--infallible-bardeen-164bc9.netlify.app/preview/architecture/concepts/yb-tserver
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@jmeehan16 jmeehan16 added the area/documentation Documentation needed label Jun 2, 2023
@jmeehan16 jmeehan16 self-assigned this Jun 2, 2023
@jmeehan16 jmeehan16 added this to In progress in Documentation via automation Jun 2, 2023
@@ -47,6 +47,16 @@ In addition to throttling controls for compactions, YugabyteDB does a variety of

YugabyteDB allows compactions to be externally triggered on a table using the [`compact_table`](../../../admin/yb-admin/#compact-table) command in the [yb-admin utility](../../../admin/yb-admin/). This is useful when new data is no longer coming into the system for a table and you might want to reclaim disk space due to overwrites or deletes that have already happened, or due to TTL expiry.

### Statistics-based full compactions to improve read performance

YugabyteDB tracks the number of key/value pairs that are read at the DocDB level over a sliding period of time (dectated by the [`auto_compact_stat_window_seconds](../../reference/configuration/yb-tserver.md#auto_compact_stat_window_seconds) gflag). If it is detected that an overwhelming amount of the DocDB reads in a tablet is spent skipping over tombstoned and obsolete keys, then a full compaction will be triggered to remove the unnecessary keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
YugabyteDB tracks the number of key/value pairs that are read at the DocDB level over a sliding period of time (dectated by the [`auto_compact_stat_window_seconds](../../reference/configuration/yb-tserver.md#auto_compact_stat_window_seconds) gflag). If it is detected that an overwhelming amount of the DocDB reads in a tablet is spent skipping over tombstoned and obsolete keys, then a full compaction will be triggered to remove the unnecessary keys.
YugabyteDB tracks the number of key-value pairs that are read at the DocDB level over a sliding period of time (dictated by the [`auto_compact_stat_window_seconds](../../../reference/configuration/yb-tserver/#auto_compact_stat_window_seconds) YB-Tserver flag). If YugabyteDB detects an overwhelming amount of the DocDB reads in a tablet are skipping over tombstoned and obsolete keys, then a full compaction will be triggered to remove the unnecessary keys.


YugabyteDB tracks the number of key/value pairs that are read at the DocDB level over a sliding period of time (dectated by the [`auto_compact_stat_window_seconds](../../reference/configuration/yb-tserver.md#auto_compact_stat_window_seconds) gflag). If it is detected that an overwhelming amount of the DocDB reads in a tablet is spent skipping over tombstoned and obsolete keys, then a full compaction will be triggered to remove the unnecessary keys.

Once all of the following conditions are met, an full compaction will automatically be triggered on the tablet. Those conditions include: the ratio of obsolete (e.g. deleted or removed due to TTL) versus active keys read reaches a threshold [`auto_compact_percent_obsolete`](../../reference/configuration/yb-tserver.md#auto_compact_percent_obsolete), and enough keys have been read within the window(`[auto_compact_min_obsolete_keys_found`](../../reference/configuration/yb-tserver.md#auto_compact_min_obsolete_keys_found)). This feature is compatible with tables with TTL, but will not schedule compactions on tables with TTL if the [TTL file expiration](../../develop/learn/ttl-data-expiration-ycql/#efficient-data-expiration-for-ttl) feature is active.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once all of the following conditions are met, an full compaction will automatically be triggered on the tablet. Those conditions include: the ratio of obsolete (e.g. deleted or removed due to TTL) versus active keys read reaches a threshold [`auto_compact_percent_obsolete`](../../reference/configuration/yb-tserver.md#auto_compact_percent_obsolete), and enough keys have been read within the window(`[auto_compact_min_obsolete_keys_found`](../../reference/configuration/yb-tserver.md#auto_compact_min_obsolete_keys_found)). This feature is compatible with tables with TTL, but will not schedule compactions on tables with TTL if the [TTL file expiration](../../develop/learn/ttl-data-expiration-ycql/#efficient-data-expiration-for-ttl) feature is active.
Once all of the following conditions are met, a full compaction is automatically triggered on the tablet:
- The ratio of obsolete (for example, deleted or removed due to TTL) versus active keys read reaches the threshold [auto_compact_percent_obsolete](../../../reference/configuration/yb-tserver/#auto_compact_percent_obsolete).
- Enough keys have been read within the window [auto_compact_min_obsolete_keys_found](../../../reference/configuration/yb-tserver/#auto_compact_min_obsolete_keys_found).
While this feature is compatible with tables with TTL, YugabyteDB won't schedule compactions on tables with TTL if the [TTL file expiration](../../../develop/learn/ttl-data-expiration-ycql/#efficient-data-expiration-for-ttl) feature is active.


### Scheduled full compactions

YugabyteDB allows full compactions over all data in a tablet to be scheduled automatically using the [`scheduled_full_compaction_frequency_hours`](../../reference/configuration/yb-tserver.md#scheduled_full_compaction_frequency_hours) and [`scheduled_full_compaction_jitter_factor_percentage`](../../reference/configuration/yb-tserver.md#scheduled_full_compaction_jitter_factor_percentage) gflags. This can be useful for performance and disk space reclamation for workloads with a large number of overwrites or deletes on a regular basis. Can be used with tables with TTL as well, but is not compatible with the [TTL file expiration](../../develop/learn/ttl-data-expiration-ycql/#efficient-data-expiration-for-ttl) feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
YugabyteDB allows full compactions over all data in a tablet to be scheduled automatically using the [`scheduled_full_compaction_frequency_hours`](../../reference/configuration/yb-tserver.md#scheduled_full_compaction_frequency_hours) and [`scheduled_full_compaction_jitter_factor_percentage`](../../reference/configuration/yb-tserver.md#scheduled_full_compaction_jitter_factor_percentage) gflags. This can be useful for performance and disk space reclamation for workloads with a large number of overwrites or deletes on a regular basis. Can be used with tables with TTL as well, but is not compatible with the [TTL file expiration](../../develop/learn/ttl-data-expiration-ycql/#efficient-data-expiration-for-ttl) feature.
YugabyteDB allows full compactions over all data in a tablet to be scheduled automatically using the [scheduled_full_compaction_frequency_hours](../../../reference/configuration/yb-tserver/#scheduled_full_compaction_frequency_hours) and [scheduled_full_compaction_jitter_factor_percentage](../../../reference/configuration/yb-tserver/#scheduled_full_compaction_jitter_factor_percentage) YB-TServer flags. This can be useful for performance and disk space reclamation for workloads with a large number of overwrites or deletes on a regular basis. This can be used with tables with TTL as well, but is not compatible with the [TTL file expiration](../../../develop/learn/ttl-data-expiration-ycql/#efficient-data-expiration-for-ttl) feature.


##### --scheduled_full_compaction_frequency_hours

The frequency with which full compactions should be scheduled on tablets. `0` indicates that the feature is disabled. Recommended value: `720` hours or greater (i.e. 30 days).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The frequency with which full compactions should be scheduled on tablets. `0` indicates that the feature is disabled. Recommended value: `720` hours or greater (i.e. 30 days).
The frequency with which full compactions should be scheduled on tablets. `0` indicates that the feature is disabled. Recommended value: `720` hours or greater (that is, 30 days).


Percentage of `scheduled_full_compaction_frequency_hours` to be used as jitter when determining full compaction schedule per tablet. Must be a value between `0` and `100`. Jitter is introduced to prevent many tablets from being scheduled for full compactions at the same time.

Jitter will be deterministically computed when scheduling a compaction, between 0 and (frequency * jitter factor) hours. Once computed, the jitter will be subtracted from the intended compaction frequency to determined the tablet's next compaction time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Jitter will be deterministically computed when scheduling a compaction, between 0 and (frequency * jitter factor) hours. Once computed, the jitter will be subtracted from the intended compaction frequency to determined the tablet's next compaction time.
Jitter is deterministically computed when scheduling a compaction, between 0 and (frequency * jitter factor) hours. Once computed, the jitter is subtracted from the intended compaction frequency to determine the tablet's next compaction time.


The maximum number of post-split compaction tasks that can be queued simultaneously (compactions that remove irrelevant data from new tablets after splits).
Example: If `scheduled_full_compaction_frequency_hours` is `720` hours (i.e. 30 days), and `scheduled_full_compaction_jitter_factor_percentage` is `33` percent, each tablet will be scheduled for compaction every `482` hours to `720` hours.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example: If `scheduled_full_compaction_frequency_hours` is `720` hours (i.e. 30 days), and `scheduled_full_compaction_jitter_factor_percentage` is `33` percent, each tablet will be scheduled for compaction every `482` hours to `720` hours.
Example: If `scheduled_full_compaction_frequency_hours` is `720` hours (that is, 30 days), and `scheduled_full_compaction_jitter_factor_percentage` is `33` percent, each tablet will be scheduled for compaction every `482` hours to `720` hours.

@jmeehan16 jmeehan16 merged commit 399340e into master Jun 14, 2023
2 checks passed
Documentation automation moved this from In progress to Done Jun 14, 2023
@jmeehan16 jmeehan16 deleted the auto-compact-stats-doc branch June 14, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Documentation needed
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants