Distributed ML Training and Fine-Tuning on Kubernetes
-
Updated
Nov 9, 2024 - Go
PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab.
Distributed ML Training and Fine-Tuning on Kubernetes
Automated Machine Learning on Kubernetes
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)
Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning Platform offering TensorFlow, Caffe, PyTorch etc. as a Service on Kubernetes
The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.
Securely share and store AI/ML projects as OCI artifacts in your container registry.
Go binding for Pytorch C++ API (libtorch)
LibTorch (PyTorch) bindings for Golang
Experiment tracking server focused on speed and scalability
Machine learning operator & controller for Kubernetes
PyTorch in Go, using LibTorch.
Example applications using the onnxruntime_go library.
go binding for pytorch
GPU scheduler for elastic/distributed deep learning workloads in Kubernetes cluster (IC2E'23)
golang client for the huggingface hub aiming for minimal subset of features over `huggingface-hub` python package
emergent GUI and other support for pytorch networks: provides a NetView for torch networks
A simple Go client for the clip-as-service server
Created by Facebook's AI Research lab (FAIR)
Released September 2016
Latest release 19 days ago