Skip to content

Http Command Manual Compaction

mytrygithub edited this page Jul 10, 2023 · 3 revisions

1. Command Overview

curl 'http://somehost:port/db/dbname?html=0&compact=default'

The Http command invokes the CompactRange function, so the parameters of the Http command are mainly CompactRange parameters.

2. Http parameter

The compact is essential. In practice, the only other parameters used are max_compaction_bytes and max_subcompactions.

name type default value Instructions
compact string There is no Specify ColumnFamily
max_compaction_bytes int64 0 When setting up compaction input files, we ignore the max_compaction_bytes limit when pulling in input files that are entirely within output key range.
max_subcompactions int 1 If > 0, it will replace the option in the DBOptions for this compaction.
exclusive_manual_compaction bool false If true, no other compaction will run at the same time as this manual compaction.
allow_write_stall bool false If true, will execute immediately even if doing so would cause the DB to enter write stall mode. Otherwise, it'll sleep until load is low enough.
change_level bool false If true, compacted files will be moved to the minimum level capable of holding the data or given level (specified non-negative target_level).
target_level int -1 If change_level is true and target_level have non-negative value, compacted files will be moved to target_level.
target_path_id int 0 Compaction outputs will be placed in options.db_paths[target_path_id]. Behavior is undefined if target_path_id is out of range.
bottommost_level_compaction enum kIfHaveCompactionFilter By default level based compaction will only compact the bottommost level if there is a compaction filter

3. Parameter detail

3.1. max_subcompactions

CompactRange is done with multiple compact tasks, which are executed one after the other in serial, not concurrently, so it is very slow.

We have proposed a Feature Request for concurrent execution to upstream RocksDB, but it has not been implemented yet

Although multiple tasks cannot be concurrently performed in the compact task, each task can be divided into multiple subtasks. This parameter specifies the number of subtasks in each compact task.

In the distributed compact, multiple sub-tasks of a single task are scheduled to only one compact computing node. Therefore, this parameter cannot be used for parallel calculation.

3.2. max_compaction_bytes

This parameter is not a parameter of CompactRange, but a member of ColumnFamilyOptions,Modified by db.SetOptions(cf, optionMap) ,Other compact may also be affected。

If the parameter is not 0, call SetOptions to modify it, and restore the original value after the end of CompactRange; If this parameter is 0, SetOptions will not be called.

3.3. bottommost_level_compaction

This parameter is rarely used and its enum definition is listed here

enum BottommostLevelCompaction {
  // Skip bottommost level compaction
  kSkip,
  // Only compact bottommost level if there is a compaction filter
  // This is the default option
  kIfHaveCompactionFilter,
  // Always compact bottommost level
  kForce,
  // Always compact bottommost level but in bottommost level avoid
  // double-compacting files created in the same compaction
  kForceOptimized
};

4. Realistic problem

In practice, to quickly perform compact, it is best to set both max_subcompactions and max_compaction_bytes, for example:

curl 'http://hostname:port/db/dbname?html=0&compact=default&max_compaction_bytes=10G&max_subcompactions=23'

If max_subcompactions is increased without max_compaction_bytes, two problems occur:

  • Produces a large number of small sst files
  • The time required for creating tasks and installing compact results is 20% or more

Setting max_compaction_bytes too large can cause two other problems:

  • Memory and external storage usage: The old sst file can be deleted only after the new sst is installed in the LSM and the old sst is no longer referenced. During this period, the old and new sst exist at the same time, occupying double the memory and SSD space
  • If too many sst files are installed at a time, the online db service will jitter greatly. During the short period of time when a new sst is installed and preheated, the system load will jitter violently due to CPU consumption and memory switching in and out, affecting the online service

5. Alternative Solution

Manual Compact cannot concurrently, but automatic Compact can. Taking advantage of this, we can try to actively trigger automatic Compact:

curl -d '{"cfo":{"cfname":"max_bytes_for_level_base=1K"}}' 'http://somehost:port/db/dbname?html=0&indent=2'

Changing max_bytes_for_level_base to very small will trigger automatic compact. Auto compact can run up to max_background_compaction The concurrency of the compaction setting. After compact is complete, manually change this parameter back to the old value.

In distributed compact, you can schedule tasks for multiple computing nodes to achieve full Compact.

Notes:Although this method improves concurrency, it increases the computing load. Manual Compact is from the top layer to the bottom layer. In this method, all layers are Compact at the same time, so some of the lower layer Compact will occur before the upper layer Compact. Therefore, compared with Manual Compact, it has more work.