A fully distributed hyperparameter optimization tool for PyTorch DNNs
-
Updated
Jan 12, 2022 - Python
A fully distributed hyperparameter optimization tool for PyTorch DNNs
Mesh TensorFlow: Model Parallelism Made Easier
performance test of MNIST hand writings usign MXNet + TF
distributed tensorflow (model parallelism) example repository
Model parallelism for NN architectures with skip connections (eg. ResNets, UNets)
A decentralized and distributed framework for training DNNs
Official implementation of DynPartition: Automatic Optimal Pipeline Parallelism of Dynamic Neural Networks over Heterogeneous GPU Systems for Inference Tasks
Fast and easy distributed model training examples.
Adaptive Tensor Parallelism for Foundation Models
PyTorch implementation of 3D U-Net with model parallel in 2GPU for large model
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
SC23 Deep Learning at Scale Tutorial Material
Distributed training (multi-node) of a Transformer model
NAACL '24 (Demo) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Slicing a PyTorch Tensor Into Parallel Shards
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
飞桨大模型开发套件,提供大语言模型、跨模态大模型、生物计算大模型等领域的全流程开发工具链。
Add a description, image, and links to the model-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the model-parallelism topic, visit your repo's landing page and select "manage topics."