distributed-training

Here are 176 public repositories matching this topic...

GokuMohandas / Made-With-ML

Learn how to design, develop, deploy and iterate on production-grade ML applications.

python data-science machine-learning natural-language-processing deep-learning pytorch data-engineering ray data-quality distributed-training mlops distributed-ml llms

Updated Aug 18, 2024
Jupyter Notebook

huggingface / pytorch-image-models

Star

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Updated Feb 17, 2025
Python

PaddlePaddle / Paddle

Star

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

python machine-learning deep-learning neural-network scalability efficiency paddlepaddle distributed-training

Updated Feb 19, 2025
C++

PaddlePaddle / PaddleNLP

Star

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

nlp search-engine compression sentiment-analysis transformers information-extraction question-answering llama pretrained-models embedding bert semantic-analysis distributed-training ernie neural-search uie document-intelligence paddlenlp llm

Updated Feb 19, 2025
Python

skypilot-org / skypilot

Star

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 14+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Updated Feb 19, 2025
Python

IDEA-CCNL / Fengshenbang-LM

Star

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

transformers pytorch chinese-nlp pretrained-models distributed-training multimodal aigc

Updated Aug 13, 2024
Python

FedML-AI / FedML

Star

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

machine-learning deep-learning inference-engine model-deployment model-serving distributed-training federated-learning mlops edge-ai ai-agent on-device-training

Updated Feb 19, 2025
Python

bytedance / byteps

Star

A high performance and generic framework for distributed DNN training

machine-learning deep-learning mxnet tensorflow keras pytorch distributed-training

Updated Oct 3, 2023
Python

tensorflow / adanet

Star

Fast and flexible AutoML with learning guarantees.

python machine-learning deep-learning tensorflow gpu ensemble automl learning-theory neural-architecture-search distributed-training tpu

Updated Nov 30, 2023
Jupyter Notebook

alpa-projects / alpa

Star

Training and serving large-scale neural networks with auto parallelization.

machine-learning deep-learning compiler distributed-computing high-performance-computing distributed-training jax alpa auto-parallelization llm

Updated Dec 9, 2023
Python

determined-ai / determined

Star

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.