Skip to content
This repository has been archived by the owner on Feb 17, 2023. It is now read-only.

Commit

Permalink
SKYHOOK-203: [Doc] Fix the benchmarking.md guide
Browse files Browse the repository at this point in the history
  • Loading branch information
YashJipkate authored and JayjeetAtGithub committed Jul 24, 2021
1 parent b111e7a commit e2a5e2a
Showing 1 changed file with 19 additions and 6 deletions.
25 changes: 19 additions & 6 deletions cpp/src/arrow/adapters/arrow-rados-cls/docs/benchmarking.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
under the License.
-->

# Setting up and benchmarking SkyhookDM
# Benchmarking SkyhookDM

1. Download the required scripts and make them executable.

Expand All @@ -30,7 +30,7 @@ wget https://raw.githubusercontent.com/uccross/skyhookdm-arrow/arrow-master/cpp/
2. Execute deploy_ceph script to deploy a Ceph cluster on a set of nodes and to mount CephFS on the client/admin node. On the client node, execute:

```bash
./deploy_ceph.sh mon1,mon2,mon3 osd1,osd2,osd3 mds1 mgr1
./deploy_ceph.sh mon1,mon2,mon3 osd1,osd2,osd3 mds1 mgr1 /dev/sdb 3
```
where mon1, mon2, osd1, etc. are the internal hostnames of the nodes.

Expand All @@ -49,20 +49,33 @@ apt install git-lfs
git clone https://github.com/jayjeetc/datasets
cd datasets/
git lfs pull
cd ..
```

5. Create and write a sample dataset to the CephFS mount by replicating the 128MB Parquet file downloaded in the previous step:

```bash
./deploy_data.sh datasets/128MB.parquet /mnt/cephfs/dataset 100 134217728
./deploy_data.sh [source file] [destination dir] [no. of copies] [stripe unit]
```

This will write 100 of ~128MB Parquet files to /mnt/cephfs/dataset using a CephFS stripe size of 128MB.
For example,

```bash
./deploy_data.sh datasets/128MB.parquet /mnt/cephfs/dataset 240 134217728
```

This will write 240 of ~128MB Parquet files to /mnt/cephfs/dataset using a CephFS stripe size of 128MB.

6. Optionally, you can also deploy Prometheus and Grafana for monitoring the cluster by following [this](https://github.com/JayjeetAtGithub/prometheus-on-baremetal#readme) guide.

7. The benchmark script ([bench.py](../scripts/bench.py)) can be used to generate benchmarks in the following syntax:
7. Run [this](../scripts/bench.py) benchmark script to get some initial benchmarks for SkyhookDM performance while using different row selectivities.

```bash
wget https://raw.githubusercontent.com/uccross/skyhookdm-arrow/arrow-master/cpp/src/arrow/adapters/arrow-rados-cls/scripts/bench.py
python3 bench.py [format(pq/rpq)] [iterations] [file:///path/to/dataset] [workers] [result file]
```

For example,
```bash
./bench.py <format(pq/rpq)> <iterations> <dataset> <workers> <file>
python3 bench.py rpq 10 file:///mnt/cephfs/dataset 16 result.json
```

0 comments on commit e2a5e2a

Please sign in to comment.