Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: StreamExecutor C API #257

Merged
merged 65 commits into from
Sep 29, 2020
Merged

Conversation

annarev
Copy link
Contributor

@annarev annarev commented Jun 12, 2020

The feedback phase will be open for two weeks until Tuesday June 30, 2020.

Status Accepted
RFC # 257
Author(s) Anna Revinskaya (annarev@google.com), Penporn Koanantakool (penporn@google.com), Yi Situ (yisitu@google.com), Russell Power (power@google.com)
Sponsor Gunhan Gulsoy (gunan@google.com)
Updated 2020-09-08

Objective

Provide basic device management C API to allow new devices to modularly connect
to the current TensorFlow runtime.

yisitu and others added 4 commits June 26, 2020 10:04
Defined a portable bool type TF_BOOL.
Added structure size definitions and updated usage examples.
Fixed typos where member functions are supposed to be member function pointers.
Fixed missing extern "C".
@annarev
Copy link
Contributor Author

annarev commented Sep 9, 2020

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

typedef struct SP_DeviceDescription {
  size_t struct_size;
  
  // Device hardware name. Used for printing.
  // Must be null-terminated.
  char* name;
  
  // Device vendor name. Used for printing.
  // Must be null-terminated.
  char* device_vendor;

  // Returns the PCI bus identifier for this device, of the form
  // [domain]:[bus]:[device].[function]
  // Used for printing.
  // Must be null-terminated.
  char* pci_bus_id;

  TF_Bool has_numa_node;  // True if `numa_node` is set.
  // Returns the NUMA node associated with this device, for use in
  // determining socket locality.
  int numa_node;

  TF_Bool has_memory_bandwidth;  // True if `memory_bandwidth` is set.
  // Device's memory bandwidth in bytes/sec.  (This is for reads/writes to/from
  // the device's own memory, not for transfers between the host and device.)
  int64_t memory_bandwidth;

  TF_Bool has_gflops;  // True if `gflops` field is set.
  // Estimate of floating point operations per second for this device * 10e-9.
  double gflops;
} SP_DeviceDescription;

penpornk and others added 10 commits September 9, 2020 10:20
* Added the Implementation Conventions and Usage Overview section.
* Replaced the Stability / User Impact with the Versioning Strategy and Stability section. Also moved the section higher.
* Updated the API design according to the real implementation.
* Updated code examples.
…gableDevice RFC (PR tensorflow#262).

* Referenced semver and TensorFlow's Version Compatibility page.
* Added the Updating Guidelines, Convention, and Detecting Incompatibility sections.
* Merged the current code examples with @yisitu's newer ones on PR tensorflow#262 and simplified them a bit.
Also some more minor edits.
Co-authored-by: Situ Yi <60493024+yisitu@users.noreply.github.com>
Co-authored-by: Situ Yi <60493024+yisitu@users.noreply.github.com>
Changed "shall" to "must" to make sure people would not misinterpret "shall".
Updates based on recent changes and discussion on RFC PR tensorflow#262
@kulinseth
Copy link
Contributor

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in DeviceDescription for version compatibility. Is that correct?

@annarev
Copy link
Contributor Author

annarev commented Sep 11, 2020

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in DeviceDescription for version compatibility. Is that correct?

Yes, we rely on struct_size and version value in SE_PlatformRegistrationParams.

@annarev
Copy link
Contributor Author

annarev commented Sep 17, 2020

I updated the doc to add a way to use a custom allocator:
21be6e9
Addition of a custom allocator was requested in Pluggable Device RFC:
#262

@penpornk
Copy link
Member

This RFC has been up for a while. The implementation has even been checked in over a month ago. I think we should wrap this up (i.e., merge this PR).

Changes to StreamExecutor C API since the last combined design review meeting with PluggableDevice on 8/13/20:

  • Added more explanation of the overall functionality.
  • Added more clarifications on versioning strategy.
  • Corrected typos. Minor functionality additions such as block_host_until_done, unified_memory_allocate/deallocate, etc.

If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.

@kulinseth
Copy link
Contributor

If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.

Thanks Penporn, for reaching out. Nothing from our side.

@penpornk
Copy link
Member

@kulinseth Thank you for the quick confirmation! :)

@ematejska @alextp I believe there is no outstanding issue now. Could we move forward?

@alextp
Copy link
Contributor

alextp commented Sep 25, 2020 via email

Copy link
Contributor

@theadactyl theadactyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@theadactyl theadactyl merged commit 71bb96f into tensorflow:master Sep 29, 2020
RFC management automation moved this from In Revision to Accepted RFCs Sep 29, 2020
@theadactyl theadactyl added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Sep 29, 2020
@wdfnst
Copy link

wdfnst commented Sep 30, 2022

@penpornk Hi, is there any PluggableDevice template for beginner? TKS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes RFC: Accepted RFC Design Document: Accepted by Review
Projects
RFC management
  
Accepted RFCs
Development

Successfully merging this pull request may close these issues.

None yet