GitHub - utsaslab/SplitFS: SplitFS: persistent-memory file system that reduces software overhead (SOSP 2019)

SplitFS

SplitFS is a file system for Persistent Memory (PM) which is aimed at reducing the software overhead of applications accessing Persistent Memory. SplitFS presents a novel split of responsibilities between a user-space library file system and an existing kernel PM file system. The user-space library file system handles data operations by intercepting POSIX calls, memory mapping the underlying file, and serving the reads and overwrites using processor loads and stores. Metadata operations are handled by the kernel file system (ext4 DAX).

SplitFS introduces a new primitive termed relink to efficiently support file appends and atomic data operations. SplitFS provides three consistency modes, which different applications can choose from without interfering with each other.

SplitFS is built on top of Quill by NVSL. We re-use the implementation of Quill to track the glibc calls requested by an application and provide our implementation for the calls. We then run the applications using LD_PRELOAD to intercept the calls during runtime and forward them to SplitFS.

Please cite the following paper if you use SplitFS:

SplitFS : Reducing Software Overhead in File Systems for Persistent Memory. Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap, Taesoo Kim, Aasheesh Kolli, Vijay Chidambaram. Proceedings of the The 27th ACM Symposium on Operating Systems Principles (SOSP 19). Paper PDF. Bibtex. Talk Video

@InProceedings{KadekodiEtAl19-SplitFS,
  title =        "{SplitFS: Reducing Software Overhead in File Systems for Persistent Memory}",
  author =       "Rohan Kadekodi and Se Kwon Lee and  Sanidhya Kashyap and  Taesoo Kim and  Vijay Chidambaram",
  booktitle =    "Proceedings of the 27th ACM Symposium on Operating
                  Systems Principles (SOSP '19)",
  month =        "October",
  year =         "2019",
  address =      "Ontario, Canada",
}

Getting Started with SplitFS

This tutorial walks you through the workflow of compiling splitfs, setting up ext4-DAX, compiling an application and running it with ext4-DAX as well as SplitFS, using a simple microbenchmark. The microbenchmark appends 128MB data to an empty file, in chunks of 4KB each, and does an fsync() at the end. Note: Set the minimum partition size of PM to 2GiB for the microbenchmark (The partition size can be set in step 2. Please confirm the partition size using df -h after step 4).

Installing Dependencies
Setup kernel
Set up SplitFS

$ export LEDGER_YCSB=1
$ cd splitfs; make clean; make; cd .. # Compile SplitFS
$ export LD_LIBRARY_PATH=./splitfs
$ export NVP_TREE_FILE=./splitfs/bin/nvp_nvp.tree

Set up ext4-DAX

$ sudo mkfs.ext4 -b 4096 /dev/pmem0
$ sudo mount -o dax /dev/pmem0 /mnt/pmem_emul
$ sudo chown -R $USER:$USER /mnt/pmem_emul

Setup microbenchmark

$ cd micro
$ gcc rw_experiment.c -o rw_expt -O3
$ cd ..

Run microbenchmark with ext4-DAX

$ sync && echo 3 > /proc/sys/vm/drop_caches # Run this with superuser
$ ./micro/rw_expt write seq 4096
$ rm -rf /mnt/pmem_emul/*

Run microbenchmark with SplitFS

$ sync && echo 3 > /proc/sys/vm/drop_caches # Run this with superuser
$ LD_PRELOAD=./splitfs/libnvp.so micro/rw_expt write seq 4096
$ rm -rf /mnt/pmem_emul/*

Results. The resultes show the throughput of doing appends on ext4 DAX and SplitFS. Appends are 5.8x faster on SplitFS.
- ext4-DAX: 0.33M appends/sec
- SplitFS: 1.92M appends/sec

Features

Low software overhead. SplitFS tries to obtain performance that is close to the maximum provided by persistent-memory hardware. The overhead due to SplitFS software is significantly lower (by 4-12x) than state-of-the-art file systems such as NOVA or ext4 DAX. As a result, performance on some applications is increased by as much as 2x.
Flexible guarantees. SplitFS is the only persistent-memory file system that allows simultaneously running applications to receive different guarantees from the file system. SplitFS offers three modes: POSIX, Sync, and Strict. Application A may in Strict mode, obtaining atomic, synchronous operations from SplitFS, while Application B may simultaneously run in POSIX mode and obtain higher performance. This is possible due to the novel split architecture used in SplitFS.
Portability and Stability. SplitFS uses ext4 DAX as its kernel component, so it works with any kernel where ext4 DAX is supported. ext4 DAX is a mature, robust code base that is actively being maintained and developed; as ext4 DAX performance increases over time, SplitFS performance increases as well. This is contrast to research file systems for persistent memory, which do not see development at the same rate as ext4 DAX.

splitfs/ contains the source code for SplitFS-POSIX
dependencies/ contains packages and scripts to resolve dependencies
kernel/ contains the Linux 4.13.0 kernel
micro/ contains the microbenchmark
leveldb/ contains LevelDB source code
rsync/ contains the rsync source code
scripts/ contains scripts to compile and run workloads and kernel
splitfs-so/ contains the SplitFS-strict shared libraries for running different workloads
sqlite3-trace/ contains SQLite3 source code
tpcc-sqlite/ contains TPCC source code
ycsb/ contains YCSB source code
tar/ contains tar source code
lmdb/ contains LMDB source code
filebench/ contains Filebench source code
fio/ contains FIO source code

The Experiments page has a list of experiments evaluating SplitFS(strict, sync and POSIX) vs ext4 DAX, NOVA-strict, NOVA-relaxed and PMFS. The summary is that SplitFS outperforms the other file systems on the data intensive workloads, while incurring a modest overhead on metadata heavy workloads. Please see the paper for more details.

The kernel patch for the implementation of relink() system call for linux v4.13 is here

System Requirements

Ubuntu 16.04 / 18.04
At least 32 GB DRAM
At least 4 cores
Baremetal machine (Not a VM)
Intel Processor supporting clflush (Comes with SSE2) or clflushopt (Introduced in Intel processor family -- Broadwell) instruction. This can be verified with lscpu | grep clflush and lscpu | grep clflushopt respectively.

Dependencies

kernel: Installing the linux kernel 4.13.0 involves installing bc, libelf-dev and libncurses5-dev. For ubuntu, please run the script cd dependencies; ./kernel_deps.sh; cd ..
SplitFS: Compiling SplitFS requires installing Boost. For Ubuntu, please run cd dependencies; ./splitfs_deps.sh; cd ..

Limitations

SplitFS is under active development.

The current implementation of SplitFS handles the following system calls: open, openat, close, read, pread64, write, pwrite64, fsync, unlink, ftruncate, fallocate, stat, fstat, lstat, dup, dup2, execve and clone. The rest of the calls are passed through to the kernel.
The current implementation of SplitFS works correctly for the following applictions: LevelDB running YCSB, SQLite running TPCC, tar, git, rsync. This limitation is purely due to the state of the implementation, and we aim to increase the coverage of applications by supporting more system calls in the future.

Applications currently supported

LevelDB (with YCSB)
SQLite (running TPCC)
Redis
git
tar
rsync
Filebench
LMDB
FIO

Testing

PJD POSIX Test Suite that tests primarily the metadata operations was run on SplitFS successfully. SplitFS passes all tests.

Running the Test Suite
Before running the tests, make sure you have set-up ext4-DAX

To run tests in all modes:

$ make test

To run tests in a specific mode:

$ make -C tests pjd.<mode>

where <mode> is one of posix, sync or strict. Example: make -C tests pjd.posix

Tip: Redirect stderr for less verbose output: e.g make test 2>/dev/null

Implementation Notes

Only regular files, block special files, and directories (only for consistency guarantees) are handled by SplitFS, the other file types are delegated to POSIX.
Only files in the persistent memory mount (/mnt/pmem_emul/) are handled by SplitFS, rest are delegated to POSIX.
Currently this is only done by examination of absolute paths specified, we aim to have this check for relative paths too, soon.
We aim to have the persistent-memory mount point controlled via a runtime environment variable soon.

License

Copyright for SplitFS is held by the University of Texas at Austin. Please contact us if you would like to obtain a license to use SplitFS in your commercial product.

Contributors

Rohan Kadekodi, UT Austin
Rui Wang, Beijing University of Posts and Telecommunications
Om Saran

Acknowledgements

We thank the National Science Foundation, VMware, Google, and Facebook for partially funding this project. We thank Intel and ETRI IITP/KEIT[2014-3-00035] for providing access to Optane DC Persistent Memory to perform our experiments.

Contact

Please contact us at rak@cs.utexas.edu or vijayc@utexas.edu with any questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SplitFS

Getting Started with SplitFS

Set up ext4-DAX

Features

Contents

System Requirements

Dependencies

Limitations

Applications currently supported

Testing

Implementation Notes

License

Contributors

Acknowledgements

Contact

About

Releases

Packages

Contributors 8

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
dependencies		dependencies
filebench		filebench
fio		fio
kernel		kernel
leveldb		leveldb
lmdb		lmdb
micro		micro
rsync		rsync
scripts		scripts
splitfs-so		splitfs-so
splitfs		splitfs
sqlite3-trace		sqlite3-trace
tar		tar
tests		tests
tpcc-sqlite		tpcc-sqlite
ycsb		ycsb
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
experiments.md		experiments.md

License

utsaslab/SplitFS

Folders and files

Latest commit

History

Repository files navigation

SplitFS

Getting Started with SplitFS

Set up ext4-DAX

Features

Contents

System Requirements

Dependencies

Limitations

Applications currently supported

Testing

Implementation Notes

License

Contributors

Acknowledgements

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages