Skip to content

typhoonzero/nccl_rdma_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NCCL multi node communication demo

Some demo code to show how to use NCCL2 to do AllReduce on multiple nodes.

Build

To build, you must install CUDA and NCCL before start to build:

git clone https://github.com/typhoonzero/nccl_rdma_demo.git
cd nccl_rdma_demo
make

Run

We use Redis to broadcast NCCL unique id, so start a redis instance before you start, if you have docker, just run:

docker run -d --name myredis -p 6379:6379 redis

Run ./demo to get help message.

Usage: demo [redisip:port] [node count] [node id]

Assume you have 2 nodes, each node have 4 GPUs, then you can run the demo like below:

  • On node 1: ./demo [redis ip]:6379 2 0
  • On node 2: ./demo [redis ip]:6379 2 1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published