Effort-less way to run C++ clients #1873

qiranq99 · 2024-04-19T07:13:58Z

Hi,

I found it quite troublesome to get a C++ client all the way through compiling the entire project and trying to understand the raw C++ APIs in the doc, thus wondering is there a easier way to access the C++ clients, e.g., via partial compilation.

BTW, have you ever profiled the performance gap between a Python client and a C++ client regarding the client-server IPC overhead (inside one machine) when retrieving data via data = client.get(oid)? According to our experiments, each retrieval takes several hundreds of microseconds (using AMD EPYC 7763), which is unacceptable in some latency-sensitive workloads, which leads to 2 questions:

would a C++ client outperforms a Python one in terms of IPC overhead?
would v6d be optimized for low latency scenarios in the future?

The text was updated successfully, but these errors were encountered:

sighingnow · 2024-04-19T07:46:50Z

Hi @qiranq99,

Thanks for touching!

I found it quite troublesome to get a C++ client all the way through compiling the entire project and trying to understand the raw C++ APIs in the doc, thus wondering is there a easier way to access the C++ clients, e.g., via partial compilation.

Yes, we do have a set of CMake options to control which components to enable: https://github.com/v6d-io/v6d/blob/main/CMakeLists.txt#L56-L65

If you only need the C++ client to access metadata and blobs, you could just enable the client and disable all other components.

BTW, have you ever profiled the performance gap between a Python client and a C++ client regarding the client-server IPC overhead (inside one machine) when retrieving data via data = client.get(oid)?

As it uses shared memory and avoids any potential data copies. The cost should be a very small constant and won't scale with the size of your data.

ccording to our experiments, each retrieval takes several hundreds of microseconds (using AMD EPYC 7763)

May I know more about the test case (maybe some code snippets that I can use to reproduce the performance gap). We would investigate to check if there are any regression.

would a C++ client outperforms a Python one in terms of IPC overhead?

There should be no performance differences about if Python is used or not. Let us know if you have encountered such problems.

would v6d be optimized for low latency scenarios in the future?

v6d is mainly optimized for sharing big data objects (e.g., tensors, tables, dataframes) between processes.

qiranq99 · 2024-04-19T08:43:48Z

Hi @sighingnow,

import vineyard
import numpy as np

client = vineyard.connect()

data = np.arange(100000)
oid = client.put(data)

# %%timeit
retrieved_data = client.get(oid)

basically we took the above example as a benchmark, and the tested latency is several hundreds of microseconds, while some low-latency-oriented object store could deliver several tens of nanoseconds. Though v6d is not aiming for low latency, <50us latency of retrieving data from the server is what we expect.

qiranq99 · 2024-04-19T08:50:17Z

BTW, I successfully compiled C++ client in isolation thanks to your hint.

However, the C++ API reference seems to be generated directly from mkdoc tools and there is no clear guidance on how to use the C++ library.

sighingnow · 2024-04-19T09:46:22Z

basically we took the above example as a benchmark, and the tested latency is several hundreds of microseconds, while some low-latency-oriented object store could deliver several tens of nanoseconds.

Will investigate. In my queue now.

However, the C++ API reference seems to be generated directly from mkdoc tools and there is no clear guidance on how to use the C++ library.

More user-friendly tutorials about the C++ APIs are in our roadmap. For now, you may refer to our unittests as examples for usage: https://github.com/v6d-io/v6d/tree/main/test

Sorry for the inconvenience.

qiranq99 · 2024-04-23T02:28:00Z

Benchmark update:

on getting a bytearray(10) from local object stores,

v6d: ~500 us
Ray Plasma: ~100 us
Redis: ~20 us (Redis is fast only for small objects)

Machine: Intel Xeon 4316 @ 2.30GHz

github-actions · 2024-05-12T00:03:28Z

/cc @sighingnow, this issus/pr has had no activity for for a long time, could you folks help to review the status ?
To suppress further notifications,

for issues,
- if it is waiting for further response from the reporter/author, please help to add the label requires-further-info,
- if you have already started working on it, please add the label work-in-progress to the issue,
- if this issue requires further designing discussion and not in current plan, or won't be fixed, please add the label requires-further-discussion or wontfix to the issue,
for pull requests,
- if you are still working on it and it is not ready for reviewing, please convert this pull request as draft PR,
- if you have decided to hold this development on, please add the requires-further-discussion label to the pull request.
  Thanks!

github-actions · 2024-07-12T00:03:20Z

/cc @sighingnow, this issus/pr has had no activity for for a long time, could you folks help to review the status ?
To suppress further notifications,

for issues,
- if it is waiting for further response from the reporter/author, please help to add the label requires-further-info,
- if you have already started working on it, please add the label work-in-progress to the issue,
- if this issue requires further designing discussion and not in current plan, or won't be fixed, please add the label requires-further-discussion or wontfix to the issue,
for pull requests,
- if you are still working on it and it is not ready for reviewing, please convert this pull request as draft PR,
- if you have decided to hold this development on, please add the requires-further-discussion label to the pull request.
  Thanks!

sighingnow added question Further information is requested performance Issues that related to the performance of vineyardd and vineyard SDKs. labels Apr 19, 2024

sighingnow mentioned this issue Apr 23, 2024

Latency optimization for vineyard client/server interaction #1877

Open

sighingnow self-assigned this Apr 23, 2024

dashanji mentioned this issue Apr 24, 2024

Not to sync the etcd metadata when get object with ipc client. #1880

Merged

github-actions bot added the stale label May 12, 2024

github-actions bot removed the stale label Jun 12, 2024

github-actions bot added the stale label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effort-less way to run C++ clients #1873

Effort-less way to run C++ clients #1873

qiranq99 commented Apr 19, 2024

sighingnow commented Apr 19, 2024

qiranq99 commented Apr 19, 2024 •

edited

Loading

qiranq99 commented Apr 19, 2024

sighingnow commented Apr 19, 2024

qiranq99 commented Apr 23, 2024 •

edited

Loading

github-actions bot commented May 12, 2024

github-actions bot commented Jul 12, 2024

Effort-less way to run C++ clients #1873

Effort-less way to run C++ clients #1873

Comments

qiranq99 commented Apr 19, 2024

sighingnow commented Apr 19, 2024

qiranq99 commented Apr 19, 2024 • edited Loading

qiranq99 commented Apr 19, 2024

sighingnow commented Apr 19, 2024

qiranq99 commented Apr 23, 2024 • edited Loading

github-actions bot commented May 12, 2024

github-actions bot commented Jul 12, 2024

qiranq99 commented Apr 19, 2024 •

edited

Loading

qiranq99 commented Apr 23, 2024 •

edited

Loading