Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Files

Feature

  1. Support both GPU and CPU Distributed Training
  2. Automatically clean up PS when the whole FrameworkAttempt is completed
  3. No need to adjust existing TensorFlow image
  4. No need to setup Kubernetes DNS and Kubernetes Service
  5. Common Feature

Prerequisite

  1. See [PREREQUISITE] in each specific Framework yaml file
  2. Need to setup Kubernetes Cluster-Level Logging, if you need to persist and expose the log for deleted Pod

Quick Start

  1. Common Quick Start
  2. CPU Example
  3. GPU Example