Skip to content

Quick Start

Mateusz Kapusta edited this page Jun 7, 2026 · 2 revisions

rixa can be used both with PyTorch and NVSHMEM.

PyTorch

One can use rixa to start PyTorch distributed job with one simple line

import rixa
rixa.pytorch.init_process_group("gloo")
ML_training_loop()

torch.distributed.destroy_process_group()

For the NCCL backend more options are provided. If the nccl backend is specified and gpu_assign_method == local_rank (default) GPUs would be assigned to the processes based on the local rank.

import rixa
rixa.pytorch.init_process_group("nccl",gpu_assign_method="local_rank")
# No need to call torch.cuda.set_device
GPU_training_loop()

torch.distributed.destroy_process_group()

Usage (NVSHMEM)

Example:

import rixa
from cuda.core import Device

store = rixa.PMIxStore(30) #timeout in seconds
dev = Device(store.get_local_rank()) #use local rank for the device 
rixa.nvshmem.init(dev, store)
nvshmem.finalize()

Clone this wiki locally