Learn how to design, develop, deploy and iterate on production-grade ML applications.
-
Updated
Aug 18, 2024 - Jupyter Notebook
Learn how to design, develop, deploy and iterate on production-grade ML applications.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
A high performance and generic framework for distributed DNN training
Fast and flexible AutoML with learning guarantees.
Training and serving large-scale neural networks with auto parallelization.
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
DLRover: An Automatic Distributed Deep Learning System
Library for Fast and Flexible Human Pose Estimation
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
Efficient Deep Learning Systems course materials (HSE, YSDA)
Best practice for training LLaMA models in Megatron-LM
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Resource-adaptive cluster scheduler for deep learning training.
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.
To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."