UCP/UCT/RCACHE: API to return rcache usage information#126
UCP/UCT/RCACHE: API to return rcache usage information#126yosefe merged 1 commit intointegration3from
Conversation
|
@brminich @alinask @evgeny-leksikov can you pls take a look? |
src/ucp/api/ucp.h
Outdated
| * a callback then data was allocated, so if UCS_INPROGRESS is | ||
| * returned from the callback, the data parameter will persist |
src/ucp/api/ucp.h
Outdated
| * Messages with a specific id. This callback is called whenever an Active | ||
| * Message that was sent from the remote peer by @ref ucp_am_send_nb is |
src/ucp/api/ucp.h
Outdated
| * received on this worker. | ||
| * | ||
| * @param [in] worker UCP worker on which to set the Active Message | ||
| * @param [in] worker UCP worker on which to set the Active Message |
src/ucp/api/ucp.h
Outdated
| * @param [in] count Number of elements to send. | ||
| * @param [in] datatype Datatype descriptor for the elements in the buffer. | ||
| * @param [in] cb Callback that is invoked upon completion of the | ||
| * @param [in] cb Callback that is invoked upon completion of the |
| UCS_CPU_SET(c, dst); | ||
| } | ||
| } | ||
| memcpy(dst, src, sizeof(*dst)); |
There was a problem hiding this comment.
does it improve something?
There was a problem hiding this comment.
yes: before the fix, we went over every set bit, and set it in 'dst', and this code took >90% of time when calling ucp_context_query in a loop. after the fix, we just bulk-copy, and the overhead of ucp_context_query is reduced.
| UCP_ATTR_FIELD_REQUEST_SIZE = UCS_BIT(0), /**< UCP request size */ | ||
| UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1) /**< UCP context thread flag */ | ||
| UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1), /**< UCP context thread flag */ | ||
| UCP_ATTR_FIELD_NUM_PINNED_REGIONS = UCS_BIT(2), /**< Current pinned regions count */ |
| for (md_index = 0; md_index < context->num_mds; ++md_index) { | ||
| uct_md_query(context->tl_mds[md_index].md, &md_attr); | ||
| UCP_SET_ATTR_FIELD(attr, num_pinned_regions, | ||
| UCP_ATTR_FIELD_NUM_PINNED_REGIONS, | ||
| += md_attr.rcache_attr.num_regions); | ||
| UCP_SET_ATTR_FIELD(attr, num_pinned_bytes, | ||
| UCP_ATTR_FIELD_NUM_PINNED_BYTES, | ||
| += md_attr.rcache_attr.total_size); | ||
| UCP_SET_ATTR_FIELD(attr, num_pinned_evictions, | ||
| UCP_ATTR_FIELD_NUM_PINNED_EVICTIONS, | ||
| += md_attr.rcache_attr.num_evictions); |
There was a problem hiding this comment.
maybe break up the loop to 3 ones to minimize number of branches? on other hand, this approach implements better memory locality
There was a problem hiding this comment.
IMO it can be more complicated code since need to save all md_query results in an array, but if we set each field individually we can reuse the UCP_SET_ATTR_FIELD macro
| rcache_attr->num_regions = rcache->num_regions; | ||
| rcache_attr->total_size = rcache->total_size; | ||
| rcache_attr->num_evictions = rcache->num_evictions; |
There was a problem hiding this comment.
maybe define ucs_rcache_attr_t inside rcache for counters and do memcpy here?
There was a problem hiding this comment.
i'd prefer word-by-word copy so it will be atomic (to read rcache fields without a lock and get a self-consistent value)
334ec29 to
17a2255
Compare
17a2255 to
7c395d6
Compare
Why
API to return rcache eviction rate per context, to allow application adjust its buffer usage
How