Skip to content

UCP/UCT/RCACHE: API to return rcache usage information#126

Merged
yosefe merged 1 commit intointegration3from
topic/rcache-info-api
Jan 20, 2021
Merged

UCP/UCT/RCACHE: API to return rcache usage information#126
yosefe merged 1 commit intointegration3from
topic/rcache-info-api

Conversation

@yosefe
Copy link
Copy Markdown
Owner

@yosefe yosefe commented Jan 19, 2021

Why

API to return rcache eviction rate per context, to allow application adjust its buffer usage

How

  • Return the information from rcache to UCT md
  • UCP context collects (sum) counters from all UCT memory domains
  • Optimize uct_ib_md_query() so it could be called periodically without a performance impact

@yosefe
Copy link
Copy Markdown
Owner Author

yosefe commented Jan 19, 2021

@brminich @alinask @evgeny-leksikov can you pls take a look?

Comment on lines +463 to +464
* a callback then data was allocated, so if UCS_INPROGRESS is
* returned from the callback, the data parameter will persist
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

Comment on lines +2416 to +2417
* Messages with a specific id. This callback is called whenever an Active
* Message that was sent from the remote peer by @ref ucp_am_send_nb is
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

* received on this worker.
*
* @param [in] worker UCP worker on which to set the Active Message
* @param [in] worker UCP worker on which to set the Active Message
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

* @param [in] count Number of elements to send.
* @param [in] datatype Datatype descriptor for the elements in the buffer.
* @param [in] cb Callback that is invoked upon completion of the
* @param [in] cb Callback that is invoked upon completion of the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

UCS_CPU_SET(c, dst);
}
}
memcpy(dst, src, sizeof(*dst));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it improve something?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes: before the fix, we went over every set bit, and set it in 'dst', and this code took >90% of time when calling ucp_context_query in a loop. after the fix, we just bulk-copy, and the overhead of ucp_context_query is reduced.

Comment on lines 328 to +330
UCP_ATTR_FIELD_REQUEST_SIZE = UCS_BIT(0), /**< UCP request size */
UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1) /**< UCP context thread flag */
UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1), /**< UCP context thread flag */
UCP_ATTR_FIELD_NUM_PINNED_REGIONS = UCS_BIT(2), /**< Current pinned regions count */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: alignment

Comment on lines +1658 to +1668
for (md_index = 0; md_index < context->num_mds; ++md_index) {
uct_md_query(context->tl_mds[md_index].md, &md_attr);
UCP_SET_ATTR_FIELD(attr, num_pinned_regions,
UCP_ATTR_FIELD_NUM_PINNED_REGIONS,
+= md_attr.rcache_attr.num_regions);
UCP_SET_ATTR_FIELD(attr, num_pinned_bytes,
UCP_ATTR_FIELD_NUM_PINNED_BYTES,
+= md_attr.rcache_attr.total_size);
UCP_SET_ATTR_FIELD(attr, num_pinned_evictions,
UCP_ATTR_FIELD_NUM_PINNED_EVICTIONS,
+= md_attr.rcache_attr.num_evictions);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe break up the loop to 3 ones to minimize number of branches? on other hand, this approach implements better memory locality

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it can be more complicated code since need to save all md_query results in an array, but if we set each field individually we can reuse the UCP_SET_ATTR_FIELD macro

Comment on lines +752 to +754
rcache_attr->num_regions = rcache->num_regions;
rcache_attr->total_size = rcache->total_size;
rcache_attr->num_evictions = rcache->num_evictions;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe define ucs_rcache_attr_t inside rcache for counters and do memcpy here?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer word-by-word copy so it will be atomic (to read rcache fields without a lock and get a self-consistent value)

@yosefe yosefe force-pushed the topic/rcache-info-api branch from 334ec29 to 17a2255 Compare January 20, 2021 12:46
@yosefe yosefe force-pushed the topic/rcache-info-api branch from 17a2255 to 7c395d6 Compare January 20, 2021 13:15
@yosefe yosefe merged commit 311cdd0 into integration3 Jan 20, 2021
@yosefe yosefe deleted the topic/rcache-info-api branch January 20, 2021 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants