Skip to content

issues Search Results · repo:kubeflow/mpi-operator language:Go

Filter by

287 results
 (98 ms)

287 results

inkubeflow/mpi-operator (press backspace or delete to remove)

Based on Multi-Gaudi Workloads Example, I am trying to run an MPIJob with the following configuration: apiVersion: kubeflow.org/v2beta1 kind: MPIJob metadata: name: mpijob spec: slotsPerWorker: 2 ...
  • gera-aldama
  • 3
  • Opened 
    on Jan 24
  • #680

In #676, we downgraded the Intel MPI version to 2021.13 since we faced the unresolved DNS name resolving issue: https://github.com/kubeflow/mpi-operator/issues/675 Ideally, we want to use the latest Intel ...
help wanted
kind/bug
  • tenzen-y
  • 5
  • Opened 
    on Jan 17
  • #678

Intel MPI E2E tests failed in CI: https://github.com/kubeflow/mpi-operator/blob/c738a83b185b4bf3bf7e6eca9d4503653294c995/test/e2e/mpi_job_test.go#L207-L272 == BEGIN pi-launcher-jlll8 pod logs == :: ...
kind/bug
  • tenzen-y
  • 10
  • Opened 
    on Jan 1
  • #675

Summary from Trivy scan: Vulnerability information: +---------------------------------+-----------------------------+----------+-------------------+---------------+----------------------------------------------------------------------------+--------------------------------------------+ ...
  • cmontemuino
  • 1
  • Opened 
    on Dec 17, 2024
  • #672

Hello team, I followed the documentation to run the PI example, but when I tried to build my own images based on the docs at https://github.com/kubeflow/mpi-operator/tree/master/examples/v2beta1/pi, I ...
  • luancaarvalho
  • 1
  • Opened 
    on Oct 12, 2024
  • #662

Currently we support OpenMPI on amd64, arm64 and ppc64le, but MPICH is only supported on amd64 and arm64. It would be great if we had feature parity in that regard.
kind/feature
  • tenzen-y
  • Opened 
    on Oct 12, 2024
  • #660

Hi team I have a example based on the latest nv image nvcr.io/nvidia/tensorflow:24.07-tf2-py3 but run the mpi job on different nodes. However it complains that the launcher could not identify the worker. ...
  • yxusnapchat
  • 2
  • Opened 
    on Oct 11, 2024
  • #658

I would propose a new v0.6.0 release once we resolve the following tasks. Dependency Update - [x] scheduler-plugins: https://github.com/kubeflow/mpi-operator/pull/653 - [x] volcano: #659 @tenzen-y ...
  • tenzen-y
  • 7
  • Opened 
    on Oct 10, 2024
  • #654

I m curious if there is any new release planned soon? I am particularly interested in a release that would bump the k8s libraries to 1.31 so that the PodSpecTemplate embedded in an MPIJob would include ...
  • klueska
  • 9
  • Opened 
    on Oct 9, 2024
  • #652

Her is my yaml content: kind: MPIJob metadata: name: deepspeed-mpi spec: slotsPerWorker: 1 runPolicy: cleanPodPolicy: Running backoffLimit: 3 mpiReplicaSpecs: Launcher: restartPolicy: ...
  • gyupup
  • 6
  • Opened 
    on Oct 8, 2024
  • #651
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue search results · GitHub