issues Search Results · repo:kubeflow/mpi-operator language:Go
Filter by
287 results
(98 ms)287 results
inkubeflow/mpi-operator (press backspace or delete to remove)Based on Multi-Gaudi Workloads Example, I am trying to run an MPIJob with the following configuration:
apiVersion: kubeflow.org/v2beta1
kind: MPIJob
metadata:
name: mpijob
spec:
slotsPerWorker: 2 ...
gera-aldama
- 3
- Opened on Jan 24
- #680
In #676, we downgraded the Intel MPI version to 2021.13 since we faced the unresolved DNS name resolving issue:
https://github.com/kubeflow/mpi-operator/issues/675
Ideally, we want to use the latest Intel ...
help wanted
kind/bug
tenzen-y
- 5
- Opened on Jan 17
- #678
Intel MPI E2E tests failed in CI:
https://github.com/kubeflow/mpi-operator/blob/c738a83b185b4bf3bf7e6eca9d4503653294c995/test/e2e/mpi_job_test.go#L207-L272
== BEGIN pi-launcher-jlll8 pod logs ==
:: ...
kind/bug
tenzen-y
- 10
- Opened on Jan 1
- #675
Summary from Trivy scan:
Vulnerability information:
+---------------------------------+-----------------------------+----------+-------------------+---------------+----------------------------------------------------------------------------+--------------------------------------------+ ...
cmontemuino
- 1
- Opened on Dec 17, 2024
- #672
Hello team,
I followed the documentation to run the PI example, but when I tried to build my own images based on the docs at
https://github.com/kubeflow/mpi-operator/tree/master/examples/v2beta1/pi, I ...
luancaarvalho
- 1
- Opened on Oct 12, 2024
- #662
Currently we support OpenMPI on amd64, arm64 and ppc64le, but MPICH is only supported on amd64 and arm64. It would be
great if we had feature parity in that regard.
kind/feature
tenzen-y
- Opened on Oct 12, 2024
- #660
Hi team I have a example based on the latest nv image nvcr.io/nvidia/tensorflow:24.07-tf2-py3 but run the mpi job on
different nodes. However it complains that the launcher could not identify the worker. ...
yxusnapchat
- 2
- Opened on Oct 11, 2024
- #658
I would propose a new v0.6.0 release once we resolve the following tasks.
Dependency Update
- [x] scheduler-plugins: https://github.com/kubeflow/mpi-operator/pull/653
- [x] volcano: #659 @tenzen-y ...
tenzen-y
- 7
- Opened on Oct 10, 2024
- #654
I m curious if there is any new release planned soon?
I am particularly interested in a release that would bump the k8s libraries to 1.31 so that the PodSpecTemplate embedded
in an MPIJob would include ...
klueska
- 9
- Opened on Oct 9, 2024
- #652
Her is my yaml content:
kind: MPIJob
metadata:
name: deepspeed-mpi
spec:
slotsPerWorker: 1
runPolicy:
cleanPodPolicy: Running
backoffLimit: 3
mpiReplicaSpecs:
Launcher:
restartPolicy: ...
gyupup
- 6
- Opened on Oct 8, 2024
- #651

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.