# CacheKV: Redesigning High-Performance Key-Value Stores with Persistent CPU Caches (Supplemental Material)

Yijie Zhong Xiamen University yijiezhong12@gmail.com

Zixiang Yu Xiamen University yuzixiang23@foxmail.com

# 1 SUMMARY OF IMPLEMENTATIONS

We implement our proposed CacheKV atop LevelDB [?] with C++. We make use of Intel CAT [?] to allocate space in the persistent CPU caches. We elaborate on the dependent softwares, representative KV stores, and testing tool as follows.

**Dependent softwares:** the implementation of CacheKV relies on a suit of softwares, including Intel CAT, ndctl, ipmctl, Intel PMWatch, and PMDK, which can be reached via the following URLs.

```
$ wget https://github.com/intel/intel-cmt-cat
$ wget https://github.com/pmem/ndctl
$ wget https://github.com/intel/ipmctl
$ wget https://github.com/intel/intel-pmwatch
$ wget https://github.com/pmem/pmdk
```

**Representative KV stores:** we use two representative KV stores for comparison, namely NoveLSM [?] and SLM-DB [?], which can be downloaded via the following URLs.

```
$ wget https://github.com/sudarsunkannan/lsm_nvm
$ wget https://github.com/WangEP/SLM-DB
```

**Testing tools:** we employ db\_bench and YCSB-C in our evaluation, where db\_bench is released with LevelDB [?] and YCSB-C [?] is a C++ version of YCSB. They can be reached via the following URLs.

```
$ wget https://github.com/google/leveldb
$ wget https://github.com/basicthinker/YCSB-C
```

# 2 EVALUATION DETAILS

Hardware configurations: We conduct extensive experiments on a single machine equipped with two 2.10 GHz Intel Golden 5318Y CPUs (with 24 cores in total), 128 GB of DRAM memory and four Optane PMem DIMMs of 200 series (128 GB per DIMM and 512 GB in total). The Optane PMem DIMMs are configured in interleaved App Direct Mode, which are connected to one processor. Figure ?? shows the major hardware configurations of our testbed and Figure ?? shows detailed information about the Optane PMem used in our evaluation.

We perform extensive testbed experiments, including (i) experiments to understand the properties of CacheKV; and (ii) experiments to understand the sensitivity of CacheKV.

**Experiments with db\_bench:** We measure the access performance of CacheKV, NoveLSM, and SLM-DB using db\_bench. The following script runs db\_bench to evaluate NoveLSM:

```
#!/bin/bash
```

\$./db\_bench --benchmarks=fillrandom,readrandom

Zhirong Shen Xiamen University shenzr@xmu.edu.cn

## Iiwu Shu

Tsinghua University, Xiamen University shujw@tsinghua.edu.cn

```
Architecture:
                                        32-bit, 64-bit
CPU op-mode(s):
                                        Little Endian
46 bits physical, 57 bits virtual
Byte Order:
Address sizes:
CPU(s):
On-line CPU(s) list:
                                        0-47
Thread(s) per core:
Core(s) per socket:
Socket(s):
                                        24
NUMA node(s):
Vendor ID:
                                        GenuineIntel
CPU family:
Model:
Model name:
                                        Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz
Stepping:
CPU MHz:
                                        800.779
CPU max MHz:
                                        3400.0000
CPU min MHz:
                                        800.0000
BogoMIPS:
                                        4200.00
Virtualization:
                                        VT-x
2.3 MiB
1.5 MiB
L1d cache:
L1i cache:
                                        60 MiB
72 MiB
L3 cache:
```

Figure 1: Configurations of our testbed.

| DimmID | Capacity    | ١ | LockState |        | I | HealthState | ١ | FWVersion     |
|--------|-------------|---|-----------|--------|---|-------------|---|---------------|
| 0x0010 | 126.742 GiB | ī | Disabled, | Frozen |   | Healthy     | Ī | 02.02.00.1516 |
| 0x0210 | 126.742 GiB | - | Disabled, | Frozen | - | Healthy     |   | 02.02.00.1516 |
| 0x0110 | 126.742 GiB | 1 | Disabled, | Frozen | 1 | Healthy     | I | 02.02.00.1516 |
| 0x0310 | 126.742 GiB | i | Disabled. | Frozen | Ĺ | Healthy     | Ĺ | 02.02.00.1516 |

Figure 2: The configurations of the Intel Optane PMem DIMMs of 200 series used in the evaluation.

```
--threads=1 --num=1000000 --value_size=64
--num_levels=2 --nvm_buffer_size=16
```

As SLM-DB has realized a db\_bench tool for evaluation, we run the following script to evaluate SLM-DB with db\_bench:

# #!/bin/bash

```
$ ./db_bench --benchmarks=fillrandom,readrandom
--threads=1 --num=1000000 --value_size=64
--nvm_dir=/mnt/pmem0dir-node0/dbbench
```

We amend db\_bench to add the configurations required in Cache KV. The following script is used to evaluate Cache KV.

## #!/bin/bash

```
#:/db_bench --threads=1 --num=1000000
--benchmarks=fillrandom --value_size=64
--num_read_threads=1 --dlock_way=4 --dlock_size=12582912
--subImm_thread=1 --skiplistSync_threshold=65536
--compactImm_threshold=10 --subImm_partition=0
```

**Experiments with YCSB:** We also employ YCSB-C as an example to clarify how we assess the performance of NoveLSM and CacheKV. The experiments with other YCSB workloads are similar. As SLM-DB is tested by its own db\_bench tool that provides an API for YCSB test, the workload file of SLM-DB can be generated by using the following script:

## #!/bin/bash

\$./ycsb run basic -P workloada > trace\_a.csv
\$./db\_bench --csv=1 --trace\_dir=../trace
--benchmarks=workloada --value\_size=64
--nvm\_dir=/mnt/pmem0dir-node0/pool
--nvm\_size=20480 --threads=1
--db=/mnt/pmem0dir-node0/dbbench
--write\_buffer\_size=4294967296

For NoveLSM and CacheKV, we can generate the workload file. The following script evaluates NoveLSM using the workload YCSB-A. The scripts to evaluate CacheKV with other YCSB workloads are similar by modifying the following script.

#### #!/bin/bash

\$ ./ycsbc -db novelsm -path /mnt/pmem0dir-node0/
-threads 1 -P workloads/workloada.spec