You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUPTI PC Sampling (see #294) can only be done from the program that executes the CUDA Kernels itself.
This means that implementing CUPTI support in lo2s is only possible by creating a separate CUPTI sampling support library and using LD_PRELOAD to inject it into the application under measure.
This of course needs some mechanism for the injected library to communicate with lo2s itself, most likely using a ring buffer over shared-memory.
As such a foreign interface might be useful outside of the CUPTI directly, i think this inter-process interface warrants its own discussion.
There are two direct questions:
How should the technical solution look like? shm_open+mmap+own ring buffer implementation, or is there already a turnkey solution for it?
How much genericity should we bake into the design?
The text was updated successfully, but these errors were encountered:
CUPTI PC Sampling (see #294) can only be done from the program that executes the CUDA Kernels itself.
This means that implementing CUPTI support in lo2s is only possible by creating a separate CUPTI sampling support library and using LD_PRELOAD to inject it into the application under measure.
This of course needs some mechanism for the injected library to communicate with lo2s itself, most likely using a ring buffer over shared-memory.
As such a foreign interface might be useful outside of the CUPTI directly, i think this inter-process interface warrants its own discussion.
There are two direct questions:
shm_open
+mmap
+own ring buffer implementation, or is there already a turnkey solution for it?The text was updated successfully, but these errors were encountered: