Tags: kubeflow/mpi-operator
Tags
* Features: * Support ManagedBy feature (`.spec.runPolicy.managedBy`) inspired by batch/v1 Job. * This allows us to dispatch MPIJobs to the multiple clusters powered by Kueue's MultiKueue. (#650, @mszadkow) * Clean ups: * Upgrade k8s libraries to v1.31 (#664, @ArangoGutierrez) * Upgrade debian version to bookworm and MPI versions are upgraded in the following: (#661, @tenzen-y) * OpenMPI: v4.1.0 -> v4.1.4 * MPICH: 3.4.1 -> 4.0.2
v0.5.0 Changes since v0.4.0: * Features: * Add support for MPICH (#562, @sheevy) * Field runLauncherAsWorker allows to add the launcher pod into the hostfile as a worker (#612, @kuizhiqing) * Add PodGroup minResources calculation for volcano integration (#566, @lowang-bh) * Bug fixes: * Fix panic when using PodGroups and PriorityClasses (#561, @tenzen-y) * Fix installation of mpijob Python module (#579, @vsoch) * Fix hostfile when jobs in different namespaces have the same name (#622, @kuizhiqing) * Clean ups: * Upgrade k8s libraries to v1.29 (#633, @tenzen-y) * Fail the mpi-operator binary if access to API is denied (#619, @emsixteeen)
Changes since 0.3.0: * Breaking changes * Removed v1 operator. If you want to use MPIJob v1, you can use the training-operator. * Support for suspending semantics. Third party controllers can leverage the suspend field to implement queuing and preemption for an MPIJob. * Support for the coscheduling plugins of the scheduler-plugins. * The operator supports multi-architecture (amd64, aarch64, and ppc64le). * Bug fixes * Fix support for elastic Horovod.