Distributed training (multi-node) of a Transformer model
machine-learning
tutorial
deep-learning
pytorch
data-parallelism
model-parallelism
distributed-training
gradient-accumulation
distributed-data-parallel
collective-communication
-
Updated
Apr 10, 2024 - Python