<h1 style="color:#65AE11;">Introduction to CUDA Streams</h1>

In this section you will get a high level introduction to concurrent CUDA Streams, their behavior, and where they can be used in CUDA applications.

<h2 style="color:#65AE11;">Objectives</h2>

By the time you complete this section you will:

* Learn what a CUDA Stream is
* Know the rules that govern stream behavior
* Know the behavior of the special default stream
* Understand that CUDA Streams can be used for memory transfers and kernel launches

<h2 style="color:#65AE11;">Instructor Presentation</h2>

Please give your attention to the instructor while they present the slides.

Run the following cell to load the slide deck for this section. If you wish, you can click on "Start Slide Show" once the slides appear to view them full-screen.

In [1]:
from IPython.display import IFrame
IFrame("https://view.officeapps.live.com/op/view.aspx?src=https://developer.download.nvidia.com/training/courses/C-AC-04-V1/task1/05_cuda_streams-04.pptx", 900, 640)

<h2 style="color:#65AE11;">Check for Understanding</h2>

Please answer the following to confirm you've learned the main objectives of this section. You can display the answers for each question by clicking on the "..." cells below the questions.

---

**Which best describes a CUDA stream?**

1. A data buffer that can be read from many parallel threads
2. A way to run any operation on the GPU concurrently
3. The CUDA mechanism utilized to coordinate instructions from multiple CPUs
4. A series of operations executed in issue-order

**Answer: 4**

---

**What are the 2 rules that govern non-default stream behavior?**

1. Operations issued in the same non-default stream will execute in parallel
2. Operations in the same stream will execute in issue order
3. No ordering is guaranteed between operations issued in different non-default streams
4. Operations in different non-default streams will always operate in parallel

**Answer: 2, 3**

---

**Which of the following is true about the default stream? Choose all that apply.**

1. Operations in the default stream cannot execute at the same time as operations in any non-default stream
2. The default stream can be used to overlap memory copy and GPU compute
3. Kernel launches and many other CUDA runtime operations are run by default in the default stream
4. The default stream is also called "stream 0" or the "NULL stream"

**Answer: 1, 3, 4**

---

**Which of the following can be executed in a non-default stream? Use the [CUDA Runtime API docs](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html) for reference. Choose all that apply.**

1. cudaMalloc
2. cudaMemcpy
3. cudaMemcpyAsync
4. Kernel launches

**Answer: 3, 4**

Look for arguments of the type `cudaStream_t` to see which functions expect a stream argument. Kernel launches are also always performed in a stream.

---

**Kernel launches always occur in a stream.**

1. True
2. False

**Answer: 1**

If a stream argument is not specified, kernel launches will occur in the default stream.

---

**How do programmers define which stream they would like a kernel to be launched in?**

1. By using the 3rd argument to the kernel's launch configuration
2. By using the 4th argument to the kernel's launch configuration
3. They cannot. Kernel launches always occur in the default stream

**Answer: 2**

---

<h2 style="color:#65AE11;">Next</h2>

Now that you have a high-level understanding of CUDA Stream behavior, you will, in the next 2 sections, learn about the syntax for launching kernels and executing certain CUDA Runtime functions in non-default streams.

Please continue to the next section: [*Kernels in Streams*](../06_Kernels_in_Streams/Kernels_in_Streams.ipynb).

<h2 style="color:#65AE11;">Optional Further Study</h2>

The following are for students with time and interest to do additional study on topics related to this workshop.

* [The CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#asynchronous-concurrent-execution) gives extensive coverage of asynchronous concurrent code execution, including many additional details regarding the use of CUDA streams.
* CUDA 10 introduced [CUDA Graphs](https://developer.nvidia.com/blog/cuda-graphs/) which is some scenarios could be considered an alternative to CUDA streams. As discussed in [*CUDA 10 Features Revealed*](https://developer.nvidia.com/blog/cuda-10-features-revealed/), "a graph consists of a series of operations, such as memory copies and kernel launches, connected by dependencies and defined separately from its execution. Graphs enable a define-once-run-repeatedly execution flow."