distributed-deep-learning

Here are 15 public repositories matching this topic...

Shigangli / WAGMA-SGD

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the…

distributed-deep-learning model-averaging partial-allreduce

Updated Jun 30, 2021
Python

sotheanithsok / Image-Recognition-using-Distributed-ResNet-Model

Star

An implementation of a distributed ResNet model for classifying CIFAR-10 and MNIST datasets.

python tensorflow mnist cifar10 distributed-deep-learning horovod

Updated Jun 6, 2022
Python

veritas9872 / Horovod-Pytorch-Tutorial

Star

Horovod Tutorial for Pytorch using NVIDIA-Docker.

docker pytorch nvidia-docker distributed-deep-learning horovod horovod-pytorch-tutorial horovod-tutorial horovod-pytorch horovod-example horovod-pytorch-example

Updated Feb 14, 2020
Python

sqaz91819 / Blockchain-NAS

Star

A blockchain based neural architecture search project.

blockchain distributed-deep-learning neural-architecture-search

Updated Jun 15, 2021
Python

Shigangli / eager-SGD

Star

Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.

distributed-deep-learning partial-allreduce gradient-averaging

Updated Nov 18, 2021
Python

ch3njust1n / smpl

Star

Simultaneous Multi-Party Learning Framework

distributed-systems deep-neural-networks deep-learning artificial-intelligence sgd artificial-neural-networks gradient-descent evolutionary-algorithm hypergraph metaheuristic distributed-deep-learning hgsgd hypergraph-sgd asynchronous-sgd

Updated Sep 21, 2018
Python

siddhanthiyer-99 / Distributed-Training-of-GANs

Star

Implemented training strategies to help improve bottlenecks and to improve the training speed while maintaining the quality of our GANs.

python deep-learning tensorflow pytorch distributed-deep-learning

Updated Jul 31, 2023
Python

lancelee82 / necklace

Star

Distributed deep learning framework based on pytorch/numba/nccl and zeromq.

deep-learning mxnet pytorch distributed numba zerorpc distributed-deep-learning distributed-training nccl

Updated Aug 10, 2023
Python

StefanoFioravanzo / distributed-deeplearning-kubernetes

Star

Collection of resources for automatic deployment of distributed deep learning jobs on a Kubernetes cluster

mxnet tensorflow kubernetes-operator distributed-deep-learning azure-kubernetes-service

Updated Sep 18, 2018
Python

Shigangli / Chimera

Star

Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.

transformers distributed-deep-learning pipeline-parallelism

Updated Dec 5, 2023
Python

Shigangli / Ok-Topk

Star

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

distributed-deep-learning sparse-allreduce topk-sgd