Tools for ML/MXNet on Kubernetes. Rework of original tf-operator to support MXNet framework.
-
Updated
Sep 14, 2018 - Go
Tools for ML/MXNet on Kubernetes. Rework of original tf-operator to support MXNet framework.
基于kubernetes/client-go API, 进行分布式训练GPU资源生命周期控制并支持多用户多任务训练日志实时通过websocket的连续重定向
Fast and Adaptive Distributed Machine Learning for TensorFlow, PyTorch and MindSpore.
Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.
Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.
To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."