[REF] Introduce multiple streams execution in TensorFlow. #61185

buptzyb · 2023-07-06T02:00:28Z

Multiple Stream TensorFlow is developed based on the official TensorFlow. It leverages the features of modern GPUs to accelerate deep learning training and inference. This Multi-Stream implementation has successfully helped several customers migrate their RecSys TF models to the GPU and go online.

For more details please visit README_MultiStream.md.

This PR is used as a reference and will not be merged to master. @changhuilin

gbaned · 2023-12-15T05:28:29Z

Hi @buptzyb This PR is in draft, any update on this? Please. Thank you!

gbaned · 2023-12-29T06:21:28Z

Hi @buptzyb This PR is in draft, any update on this? Please. Thank you!

gbaned · 2024-01-19T07:09:08Z

Hi @buptzyb This PR is in draft, any update on this? Please. Thank you!

Imported from GitHub PR #61632 This PR works as a part of the whole Multi-Stream feature in TF, which is proposed in #61185. Allow merging the host_to_device/device_to_host/device_to_device data copy streams into the compute stream in one stream group. This is useful to reduce the overhead caused by GPU stream synchronization, especially when data transfers are frequent. Another benefit is, for host_to_device copy, merging streams allows early scheduling of subsequent ops, doesn't have to wait until the data copy is really finished. As a part of the multi-stream feature, it can help multi-stream reach a much higher throughput. Taking our proto models as an example, the original model inference throughput is **1524** samples/second, and **2229** samples/ second with multi-stream, and **2471** samples/second further with stream-merging. However, stream-merging can also be used separately. We got inference throughput gain from **1028** samples/second to **1187** samples/second by enabling stream-merging. Please refer to the 'Performance' part in our [document](https://docs.google.com/document/d/1yL3lWk_iFKqLTyekkuaiKXZ78I0lPmD5kM1fghHRs4Y/edit?usp=sharing) for detailed and more experiment results. Copybara import of the project: -- 9e51f38 by Robin Zhang <robinz@nvidia.com>: Allow merging compute-copy streams -- a45967f by Robin Zhang <robinz@nvidia.com>: Improve coding style -- ccae79b by Robin Zhang <robinz@nvidia.com>: Rename stream_merge_options_ -- 332e1fe by Robin Zhang <robinz@nvidia.com>: Put stream checking out of callback -- 4a0c789 by Robin Zhang <robinz@nvidia.com>: Move StreamMergeOptions to Experimental -- efe56d7 by Robin Zhang <robinz@nvidia.com>: add some comments Merging this change closes #61632 FUTURE_COPYBARA_INTEGRATE_REVIEW=#61632 from buptzyb:multistream-streammerge 5aabb58 PiperOrigin-RevId: 628618396

Imported from GitHub PR #61632 This PR works as a part of the whole Multi-Stream feature in TF, which is proposed in #61185. Allow merging the host_to_device/device_to_host/device_to_device data copy streams into the compute stream in one stream group. This is useful to reduce the overhead caused by GPU stream synchronization, especially when data transfers are frequent. Another benefit is, for host_to_device copy, merging streams allows early scheduling of subsequent ops, doesn't have to wait until the data copy is really finished. As a part of the multi-stream feature, it can help multi-stream reach a much higher throughput. Taking our proto models as an example, the original model inference throughput is **1524** samples/second, and **2229** samples/ second with multi-stream, and **2471** samples/second further with stream-merging. However, stream-merging can also be used separately. We got inference throughput gain from **1028** samples/second to **1187** samples/second by enabling stream-merging. Please refer to the 'Performance' part in our [document](https://docs.google.com/document/d/1yL3lWk_iFKqLTyekkuaiKXZ78I0lPmD5kM1fghHRs4Y/edit?usp=sharing) for detailed and more experiment results. Copybara import of the project: -- 9e51f38 by Robin Zhang <robinz@nvidia.com>: Allow merging compute-copy streams -- a45967f by Robin Zhang <robinz@nvidia.com>: Improve coding style -- ccae79b by Robin Zhang <robinz@nvidia.com>: Rename stream_merge_options_ -- 332e1fe by Robin Zhang <robinz@nvidia.com>: Put stream checking out of callback -- 4a0c789 by Robin Zhang <robinz@nvidia.com>: Move StreamMergeOptions to Experimental -- efe56d7 by Robin Zhang <robinz@nvidia.com>: add some comments Merging this change closes #61632 Reverts changelist 525613555 FUTURE_COPYBARA_INTEGRATE_REVIEW=#61632 from buptzyb:multistream-streammerge 5aabb58 PiperOrigin-RevId: 628618396

google-ml-butler bot added the size:XL CL Change Size:Extra Large label Jul 6, 2023

google-ml-butler bot assigned gbaned Jul 6, 2023

gbaned added this to Assigned Reviewer in PR Queue via automation Jul 6, 2023

buptzyb force-pushed the multistream-release branch from 0afcd2d to b27a1cd Compare July 7, 2023 14:30

init commit

e77676e

buptzyb force-pushed the multistream-release branch from b27a1cd to e77676e Compare July 9, 2023 14:19

buptzyb added 3 commits July 11, 2023 01:26

multiple stream groups for PJRT

a9d19a9

Remove "stream_id == 0" restriction

05b5df9

Cache TRT engine in stream level

2821549

gbaned requested a review from d0k July 13, 2023 03:34

buptzyb added 5 commits July 14, 2023 08:30

optimize rm cache key style

2a4df31

revert ordinal decoding

5d01ba3

rename gpu_stream_merge_options_

a278cf6

fix bug in multi-context trt

44a113e

add gpu_memory_pool_mode option

d9db68d

buptzyb mentioned this pull request Aug 18, 2023

Allow merging compute-copy streams #61632

Merged

buptzyb added 3 commits August 24, 2023 10:28

Merge commit '2539ebc' into multistream-release

8f5ba87

Delete device ordinal encoding

aa361b7

Add MinHeapPolicy

e74add9

buptzyb added 2 commits August 24, 2023 13:46

Select stream group with DeviceSelector

c777003

Merge commit '67463d0' into multistream-release

6b177ae

buptzyb closed this Jan 19, 2024

PR Queue automation moved this from Assigned Reviewer to Closed/Rejected Jan 19, 2024

copybara-service bot mentioned this pull request Apr 27, 2024

PR #61632: Allow merging compute-copy streams #66555

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REF] Introduce multiple streams execution in TensorFlow. #61185

[REF] Introduce multiple streams execution in TensorFlow. #61185

buptzyb commented Jul 6, 2023

gbaned commented Dec 15, 2023

gbaned commented Dec 29, 2023

gbaned commented Jan 19, 2024

[REF] Introduce multiple streams execution in TensorFlow. #61185

[REF] Introduce multiple streams execution in TensorFlow. #61185

Conversation

buptzyb commented Jul 6, 2023

gbaned commented Dec 15, 2023

gbaned commented Dec 29, 2023

gbaned commented Jan 19, 2024