Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi job not support hostNetwork: true pattern #778

Closed
guunergooner opened this issue Apr 15, 2020 · 9 comments
Closed

mpi job not support hostNetwork: true pattern #778

guunergooner opened this issue Apr 15, 2020 · 9 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.

Comments

@guunergooner
Copy link
Contributor

guunergooner commented Apr 15, 2020

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug

What happened:
When I ran mpi-example, I used the hostNetwork:true pattern and found that the task could not parse the mpiworker.host

root@gpu-ser348-06:/etc/volcano# cat mpiworker.host  && echo
lm-mpi-job-mpiworker-0.lm-mpi-job
lm-mpi-job-mpiworker-1.lm-mpi-job
lm-mpi-job-mpiworker-2.lm-mpi-job
root@gpu-ser348-06:/etc/volcano# ping lm-mpi-job-mpiworker-0.lm-mpi-job
ping: unknown host lm-mpi-job-mpiworker-0.lm-mpi-job

What you expected to happen:
Use the hostNetwork:true pattern could parse the mpiworker.host

How to reproduce it (as minimally and precisely as possible):

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: lm-mpi-job
spec:
  minAvailable: 3
  schedulerName: volcano
  plugins:
    ssh: []
    svc: []
  tasks:
    - replicas: 1
      name: mpimaster
      policies:
        - event: TaskCompleted
          action: CompleteJob
      template:
        spec:
          hostNetwork: true
          containers:
            - command:
                - /bin/sh
                - -c
                - |
                  MPI_HOST=`cat /etc/volcano/mpiworker.host | tr "\n" ","`;
                  mkdir -p /var/run/sshd; /usr/sbin/sshd;
                  mpiexec --allow-run-as-root --host ${MPI_HOST} -np 2 mpi_hello_world > /home/re;
              image: volcanosh/example-mpi:0.0.1
              name: mpimaster
              ports:
                - containerPort: 22
                  name: mpijob-port
              workingDir: /home
          restartPolicy: OnFailure
    - replicas: 2
      name: mpiworker
      template:
        spec:
          hostNetwork: true
          containers:
            - command:
                - /bin/sh
                - -c
                - |
                  mkdir -p /var/run/sshd; /usr/sbin/sshd -D;
              image: volcanosh/example-mpi:0.0.1
              name: mpiworker
              ports:
                - containerPort: 22
                  name: mpijob-port
              workingDir: /home
          restartPolicy: OnFailure
---

Anything else we need to know?:

Environment:

  • Volcano Version:
kubectl -n volcano-system get pods -o json | jq '.items[].spec.containers[0] | "name:" + .name + " image:" + .image'
"name:admission image:volcanosh/vc-webhook-manager:v0.4"
"name:main image:volcanosh/vc-webhook-manager:v0.4"
"name:volcano-controllers image:volcanosh/vc-controller-manager:v0.4"
"name:volcano-scheduler image:volcanosh/vc-scheduler:v0.4"
  • Kubernetes version (use kubectl version):
kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14-mlpe-20200217", GitCommit:"883cfa7a769459affa307774b12c9b3e99f4130b", GitTreeState:"clean", BuildDate:"2020-02-17T14:06:28Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    description: Computer
    product: KVM
    vendor: Red Hat
    version: RHEL 7.0.0 PC (i440FX + PIIX, 1996)
  • OS (e.g. from /etc/os-release):
root@gpu-ser348-06:~$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
root@gpu-ser348-06:~$ uname -a
Linux gpu-ser348-06 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:
@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 15, 2020
@k82cn
Copy link
Member

k82cn commented Apr 16, 2020

/cc @hzxuzhonghu

It's better to handle this feature by job controller :)

@hzxuzhonghu
Copy link
Collaborator

@tongchao199 You need to set the dnspolicy to ClusterFirstWithHostNet when you are using hostnetwork

@hzxuzhonghu hzxuzhonghu added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Apr 16, 2020
@hzxuzhonghu
Copy link
Collaborator

We should set this for user when dnsPolicy is not specified in this case.

@guunergooner
Copy link
Contributor Author

@tongchao199 You need to set the dnspolicy to ClusterFirstWithHostNet when you are using hostnetwork

thanks, The domain name of the mpiworker.host file can be resolved normally

@k82cn
Copy link
Member

k82cn commented Apr 17, 2020

let's also put this into FAQ :)

@hzxuzhonghu
Copy link
Collaborator

I meant to set ClusterFirstWithHostNet by default when explicitly set hostnetwork

@k82cn
Copy link
Member

k82cn commented Apr 18, 2020

I meant to set ClusterFirstWithHostNet by default when explicitly set hostnetwork

ok :)

@k82cn
Copy link
Member

k82cn commented May 4, 2020

What's next step? Is it be fixed by #779?

@guunergooner
Copy link
Contributor Author

Fixed with #779

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants