This repository contains the source code for SSDBM'22 paper: Bi-directional Log-Structured Merge Tree
Thanks to Dr. Qizhong Mao for developing lsm-tree's main structure for this simulator.
The source code for the simulator is in ./lsm
folder.
Run ./read_only.py
program to test the read-only workload.
Tree configuration:
- do_load: boolean. Set
do_load = True
to generate lsm-tree first. The constructed lsm-tree can be reused. - key_len: key length, in # of bytes.
- data_len: value length, in # of bytes.
- component_size: # of records in a component. In this simulator, we let memory component and disk components have the same size.
- num_records: # of loaded records.
- size_ratio: size ratio of lsm-tree.
- l_0: # of component in level 0.
- M: # of sentinel components.
- cache_size: # of pages in cache.
- IO_unit: unit to compute IO cost.
- do_point_query: Set
do_point_query = False
to test range query. - fragment_threshold: # of fragment in a disk component.
- policy config: go to
./read_only.py: line31-35
, choose the policy to use.
Run ./mix_read_write.py
program to test the mixed workload.
Tree configuration (besides common parameters mentioned above):
- num_write_component: # of load components.
- num_rewrite_component: # of newly inserted components.
We prepare four tested datasets in ./data
folder.
Each line is a record: "key,value". For example, "513219,513219", key = 513219, value = 513219.
File name: "order""# of record""update frequency".txt
For example: "random_data_1000000_with_anti_heavy.txt": randomly inserted 1000000 records, contains heavy updated record.
We prepare three tested query workloads in ./data
folder.
Each line is an operation. There are two types of operations:
- insertion operation, with symbol
0
. For example, "1:319890", insert a record key = 319890 and value = 319890. - range scan operation, with symbol
1
. For example, "0,40-91-130", a range query.
Range query format: "query length"-"min key"-"max-key".
For example: "0,40-91-130": a range query: [91, 130], the query length is 40.
We provide program to generate more datasets and query workloads. The code is in ./workload_generator
folder.
This code repository is open-sourced and may be used for research purpose only. No express warranty of any kind and use at your own risk.
Contact Xin Zhang (xzhan261@ucr.edu) for any questions.