Source: https://www.cnblogs.com/rossiXYZ/p/15339953.html

### 3. Splitting Data and Runtime System

In [None]:
import torch

##### Example 1

In [None]:
x = torch.randn(2)
y = torch.randn(22)

In [None]:
x_mean = x.mean()

In [None]:
y_mean = y.mean()

In [None]:
import torch

In [None]:
x.shape, y.shape

(torch.Size([2]), torch.Size([22]))

Compute the mean of `x` and `y` in parallel using CUDA stream

In [None]:
stream1 = torch.cuda.Stream()
stream2 = torch.cuda.Stream()

In [None]:
with torch.cuda.stream(stream1):
    x_mean = x.mean()

In [None]:
with torch.cuda.stream(stream1):
    y_mean = y.mean()

In [None]:
x_mean, y_mean

(tensor(-0.4989), tensor(-0.3499))

##### Example 2

In [None]:
from contextlib import contextmanager

Write context managers that work as bellow and explain it

In [None]:
@contextmanager
def first_context():
    # Code to be executed before the context
    print("Entering the context")
    
    # Yield the value to be used in the context
    yield
    
    # Code to be executed after the context
    print("Leaving the context")

**Explain**

`yield` is used to indicate the boundary between the setup (enter) and the cleanup (exit) phases of the context.

Since there is no value being yielded, the variable that is assigned the result of calling the function will be `None`.   

In [None]:
with first_context() as x:
    print(x)
    print(1 + 2)

Entering the context
None
3
Leaving the context


In [None]:
@contextmanager
def second_context():
    print("Entering the context")
    # Yield the value to be used in the context
    yield "Hello, World!"
    print("Leaving the context")

**Explain**

Because the `yield` statement returns the string "Hello, World!", so when the context is used, the value "Hello, World!" is available to the code that called the context manager.

In [None]:
with second_context() as x:
    print(x)
    print(1 + 2)

Entering the context
Hello, World!
3
Leaving the context


##### Example 3

In [None]:
a = torch.tensor([69, 69, 69])
b = torch.tensor([420, 420, 420])

In [None]:
x = a + b

In [None]:
from contextlib import contextmanager

Write a context manager that takes a stream and executes a block of code either on a CUDA stream or a CPU stream. Explain the code

In [None]:
@contextmanager
def use_stream(stream):
    if not isinstance(stream, torch.cuda.Stream):
        yield
        return # stop the function's execution
    
    with torch.cuda.stream(stream):
        yield

**Explain**

- `@contextmanager`: This is a decorator that turns the use_stream function into a context manager. It allows the function to be used in a with statement.

- `def use_stream(stream)`: This defines the use_stream function that takes a single argument called stream. This can be a CPU or CUDA stream.

- `if not isinstance(stream, torch.cuda.Stream):`: This line checks if the provided stream is a CPU stream. If the stream is not an instance of torch.cuda.Stream, it means it's a CPU stream.

- `yield`: This is the point in the context manager where the code inside the with block will be executed. If the stream is a CPU stream, the yield statement allows the code within the with block to execute without any additional setup.

- `return`: This line stops the execution of the function early when the stream is a CPU stream. This ensures that the code following the return statement, which sets up a CUDA stream context, is not executed.

- `with torch.cuda.stream(stream):`: This line is only executed if the stream is a CUDA stream. It creates a new context for the provided CUDA stream, ensuring that any code executed within this context will use the specified CUDA stream.

- `yield`: This yield statement is used when the stream is a CUDA stream. It allows the code within the with block to execute using the set up CUDA stream.

In [None]:
a, b

(tensor([69, 69, 69]), tensor([420, 420, 420]))

In [None]:
with use_stream(stream):
    x = a + b

In [None]:
x

tensor([489, 489, 489])

##### Example 4

In [None]:
xs = [1, 2, 3]

In [None]:
xs

[1, 2, 3]

Implement a queue data structure using Python's built-in module that stores all values in `xs` and works as bellow

In [None]:
from queue import Queue

In [None]:
q = Queue()

In [None]:
for x in xs:
    q.put(x)

In [None]:
q.get()

1

In [None]:
q.get()

2

In [None]:
q.get()

3

##### Example 5

In [None]:
device = torch.device("cuda:0")

In [None]:
x = torch.tensor([69, 69, 69])

In [None]:
total = x.sum()

In [None]:
import torch

In [None]:
x

tensor([69, 69, 69])

In [None]:
device

device(type='cuda', index=0)

Compute the sum of all elements in `x` on `device`

In [None]:
with torch.cuda.device(device):
    total = x.sum()

In [None]:
total

tensor(207)

##### Example 5.1

In [None]:
import time

In [None]:
def print_numbers():
    for i in range(5):
        print(i)
        time.sleep(1)

Run the function `print_numbers` on a new thread

In [None]:
import threading

In [None]:
thread = threading.Thread(target=print_numbers)

In [None]:
thread.start()

0
1
2
3
4


##### Example 5.2

In [None]:
xs = torch.tensor([69, 69, 69]).float()

In [None]:
device = torch.device("cuda:0")

In [None]:
mean = xs.mean()

In [None]:
import torch

In [None]:
xs

tensor([69., 69., 69.])

In [None]:
device

device(type='cuda', index=0)

Create a new CUDA stream on the device, and then calculate the average of `xs` using that CUDA stream.

In [None]:
stream = torch.cuda.Stream(stream)

In [None]:
with torch.cuda.device(device):
    with torch.cuda.stream(stream):
        mean = xs.mean()

In [None]:
mean

tensor(69.)

##### Example 6

In [None]:
import sys

In [None]:
sys.exc_info()

(None, None, None)

### Spawn Workers

In [25]:
def clock_cycles(m: int, n: int):
    """Generates schedules for each clock cycle."""
    # m: number of micro-batches
    # n: number of partitions
    # i: index of micro-batch
    # j: index of partition
    # k: clock number
    #
    # k (i,j) (i,j) (i,j)
    # - ----- ----- -----
    # 0 (0,0)
    # 1 (1,0) (0,1)
    # 2 (2,0) (1,1) (0,2)
    # 3       (2,1) (1,2)
    # 4             (2,2)
    for k in range(m+n-1):
        yield [(k-j, j) for j in range(max(1+k-m, 0), min(1+k, n))]

In [19]:
from threading import Thread
from queue import Queue
from contextlib import contextmanager

In [43]:
def worker(in_queue, out_queue, device):
    with torch.cuda.device(device):
        while True:
            task = in_queue.get()
            
            try:
                output = task()
            except Exception:
                out_queue.put(False)
                continue
            
            out_queue.put((True, output))

In [37]:
def hello():
    print("hello")

In [38]:
@contextmanager
def spawn_workers(devices):
    in_queues = []
    out_queues = []
    
    workers = {}
    
    for device in devices:
        in_queue = Queue()
        out_queue = Queue()
        workers[device] = (in_queue, out_queue)
        
        t = Thread(
            target=worker,
            args=(in_queue, out_queue, device),
            daemon=True
        )
        t.start()
        
        in_queues.append(in_queue)
        out_queues.append(out_queue)
    
    yield (in_queue, out_queue)

In [39]:
devices = [
    torch.device("cuda:0"),
    torch.device("cuda:1"),
    torch.device("cuda:2"),
    torch.device("cuda:3")
]

In [40]:
devices

[device(type='cuda', index=0),
 device(type='cuda', index=1),
 device(type='cuda', index=2),
 device(type='cuda', index=3)]

In [41]:
n_microbatches = 4
n_partritions = 3

In [45]:
def hello():
    print("Hello")

In [46]:
with spawn_workers(devices) as (in_queues, out_queues):
    while True:
        for in_queue in in_queues:
            in_queue.put(hello)
        time.sleep(1)

Exception in thread Exception in thread Thread-21:
Traceback (most recent call last):
  File "/Users/education/opt/anaconda3/envs/ml_engineering/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-23:
Traceback (most recent call last):
  File "/Users/education/opt/anaconda3/envs/ml_engineering/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Thread-22:
Traceback (most recent call last):
  File "/Users/education/opt/anaconda3/envs/ml_engineering/lib/python3.8/threading.py", line 932, in _bootstrap_inner
Exception in thread Thread-24:
Traceback (most recent call last):
  File "/Users/education/opt/anaconda3/envs/ml_engineering/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/Users/education/opt/anaconda3/envs/ml_engineering/lib/python3.8/threading.py", line 870, in run
    self.run()
  File "/Users/education/opt/anaconda3/envs/ml_engineering/lib/python3.8/threading.py", line 870, in run
    self._target(*self._

TypeError: 'Queue' object is not iterable

        raise RuntimeError("PyTorch was compiled without CUDA support")
RuntimeError: PyTorch was compiled without CUDA support
    raise RuntimeError("PyTorch was compiled without CUDA support")
RuntimeError: PyTorch was compiled without CUDA support
raise RuntimeError("PyTorch was compiled without CUDA support")
RuntimeError: PyTorch was compiled without CUDA support
    raise RuntimeError("PyTorch was compiled without CUDA support")
RuntimeError: PyTorch was compiled without CUDA support
