Skip to content

Tags: kubeflow/mpi-operator

Tags

v0.6.0

Verified

This tag was signed with the committer’s verified signature.
tenzen-y Yuki Iwai
* Features:

  * Support ManagedBy feature (`.spec.runPolicy.managedBy`) inspired by batch/v1 Job.
    * This allows us to dispatch MPIJobs to the multiple clusters powered by Kueue's MultiKueue. (#650, @mszadkow)
* Clean ups:
  * Upgrade k8s libraries to v1.31 (#664, @ArangoGutierrez)
  * Upgrade debian version to bookworm and MPI versions are upgraded in the following: (#661, @tenzen-y)
    * OpenMPI: v4.1.0 -> v4.1.4
    * MPICH: 3.4.1 -> 4.0.2

v0.5.0

Verified

This tag was signed with the committer’s verified signature.
tenzen-y Yuki Iwai
v0.5.0

Changes since v0.4.0:

* Features:
  * Add support for MPICH (#562, @sheevy)
  * Field runLauncherAsWorker allows to add the launcher pod into the hostfile as a worker (#612, @kuizhiqing)
  * Add PodGroup minResources calculation for volcano integration (#566, @lowang-bh)
* Bug fixes:
  * Fix panic when using PodGroups and PriorityClasses (#561, @tenzen-y)
  * Fix installation of mpijob Python module (#579, @vsoch)
  * Fix hostfile when jobs in different namespaces have the same name (#622, @kuizhiqing)
* Clean ups:
  * Upgrade k8s libraries to v1.29 (#633, @tenzen-y)
  * Fail the mpi-operator binary if access to API is denied (#619, @emsixteeen)

v0.4.0

Changes since 0.3.0:

* Breaking changes
  * Removed v1 operator. If you want to use MPIJob v1, you can use the training-operator.
* Support for suspending semantics. Third party controllers can leverage the suspend field to implement queuing and preemption for an MPIJob.
* Support for the coscheduling plugins of the scheduler-plugins.
* The operator supports multi-architecture (amd64, aarch64, and ppc64le).
* Bug fixes
  * Fix support for elastic Horovod.

v0.3.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Bundle all controller versions in the image (#421)

v0.2.3

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Update ADOPTERS.md (#258)

v0.2.2

Update README to use the new single deploy config (#143)

v0.2.1

update dockerfile and examples to v1alpha2 (#130)

v0.2.0

MXNet distributed training (#122)

* MXNet distributed training

* change apiVersion

* Addressed some review comments

Newline related comments

* Revert "change apiVersion"

This reverts commit 163aed7.

0.1.0

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
add terrytangyuan as reviewer; remove inactive reviewers (#76)