Skip to content

SST File Online Benchmark

ZengJingtao edited this page Mar 13, 2023 · 1 revision

SST file online performance test

db_bench can perform multiple tests on DB, but its source data is randomly generated.

On the other hand, ordinary performance testing is a full-link test at the level of the entire DB, which cannot dive down into a single SST.

Previously, we added the function of viewing the internal structure of a single SST file in ToplingDB's WebView, which supports various SSTs of Topling and BlockBasedTable.

Recently (2023-03-03) we have added an online Benchmark function on the WebView of the SST file, which can perform multiple performance tests on the SST.

image

| scan | rev scan | scan value | seek | seek value | rand value

Create an Iterator from SST's TableReader for testing, and add the following parameters to the url:

parameter name type default value description
bench enum null Optional value: {scan,seek}
repeat int 1 The number of repetitions, for the entire Benchmark, not for a single operation
reverse bool 0 Forward scan or reverse scan, both valid in scan and seek
rand bool 0 It is only valid during seek, first load all the keys of the entire SST into a StrVec, then randomly shuffle the StrVec, then traverse the StrVec sequentially and execute seek
pointNode 1 bool 1 Special parameters, only valid when bench=seek, rand=1 and fecth_value <= 1, and only meaningful for ToplingZipTable
fetch_value int 0 When bench=scan, the type is bool, indicating whether to read value during scanning
When bench=seekNote 2and rand is true, it means how many KVs are read sequentially after seeking to a position

Node 1: point parameter, because ToplingZipTable uses the patented PForDelta variant to compress the offset of value (the length of value is the difference between adjacent offsets), and Iterator decompresses PForDelta by block. For random Seek, there are only two valid data in the decompressed whole block. The PForDelta The variant can efficiently decompress only two pieces of data at a time, thereby greatly improving the performance of the search. When point=1, iter->Seek will not be called, but iter->PointGet will be called to achieve the aforementioned functions. The default of PointGet The behavior is implemented through Seek to adapt to other SSTs other than ToplingZipTable

Node 2: When fetch_value > 1, the number of seeks = entries/fetch_value, where entries refer to the total number of KVs in the SST file. After the seek, move forward (Next) or backward (Prev) fetch_value-1 times according to the reverse parameter, and access fetch_value in total KV

When bench=seek and rand=1, the table is output, otherwise it is simple text output

Shortcuts

The individual link buttons are shortcuts to some combination of the above url parameters:

link parameter combination description
scan bench=scan sequential forward scan, do not read value
rev scan bench=scan&reverse=1 sequential backward scan, do not read value
scan value bench=scan&fetch_value=1 sequential forward scan, read value
seek bench=seek seek, do not read value
seek value bench=seek&fetch_value=1 seek, read value
rand value bench=seek&fetch_value=1&rand=1 random point get

To test seek to a random point, and then forward/reverse scan several steps, you need to enter the url parameter yourself (you can use the rand value link as a template to modify)

BlockBasedTable and ToplingZipTable

In this performance test, fetch_value is implemented through iter->PrepareValue().

The PrepareValue of BlockBasedTable is read by Block: when iter is at the starting position of a Block, it reads/decompresses the entire Block at one time, and when it reads other values ​​of the Block later, there is no need to do anything. This processing method will cause the fetch_value stage to take a time close to 0.

PrepareValue of ToplingZipTable faithfully reads each value

  • If there is no compression, use Zero Copy to directly return the memory range corresponding to the value in the mmap of the SST file
  • If the value is compressed, the value will be decompressed on site, and the decompression throughput rate is generally above 1GB per second
    • This processing method is feasible because on-site decompression is fast enough. For example, a single piece of data is 500 bytes, and on-site decompression only takes 500 nanoseconds

Other SSTs of Topling are not compressed, and PrepareValue is Zero Copy