Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue Capability bug #921

Closed
silenceli opened this issue Jul 13, 2020 · 5 comments
Closed

Queue Capability bug #921

silenceli opened this issue Jul 13, 2020 · 5 comments

Comments

@silenceli
Copy link

Queue Capability maybe has a bug. In certain scenarios, queues cannot limit users' usage

How to reproduce

  • set queue capability nvidia.com/gpu = 2
  • create 4 job, each job need nvidia.com/gpu = 1
  • at that time queue works fine, 2 running and 2 pending, see below
[root@k8s-master-1-15-1ef36b060 ~]# kubectl describe queue training-jarvis-dev-team-k8s-training-test
Name:         training-jarvis-dev-team-k8s-training-test
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.volcano.sh/v1beta1
Kind:         Queue
Metadata:
  Creation Timestamp:  2020-04-23T08:09:56Z
  Generation:          20
  Resource Version:    75588153
  Self Link:           /apis/scheduling.volcano.sh/v1beta1/queues/training-jarvis-dev-team-k8s-training-test
  UID:                 5ac970cf-9894-4e74-94fb-21e89cd55b68
Spec:
  Capability:
    Cpu:             80
    Memory:          399360Mi
    nvidia.com/gpu:  2
    qiyi.com/fuse:   10000
  Weight:            1
Status:
  Pending:  2
  Running:  2
  State:    Open
Events:     <none>
  • stop one running job
  • then the two pending job will RUNNING
[root@k8s-master-1-15-1ef36b060 ~]# kubectl describe queue training-jarvis-dev-team-k8s-training-test
Name:         training-jarvis-dev-team-k8s-training-test
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.volcano.sh/v1beta1
Kind:         Queue
Metadata:
  Creation Timestamp:  2020-04-23T08:09:56Z
  Generation:          20
  Resource Version:    75589682
  Self Link:           /apis/scheduling.volcano.sh/v1beta1/queues/training-jarvis-dev-team-k8s-training-test
  UID:                 5ac970cf-9894-4e74-94fb-21e89cd55b68
Spec:
  Capability:
    Cpu:             80
    Memory:          399360Mi
    nvidia.com/gpu:  2
    qiyi.com/fuse:   10000
  Weight:            1
Status:
  Running:  3
  State:    Open
Events:     <none>
@k82cn
Copy link
Member

k82cn commented Jul 13, 2020

/cc @Thor-wl

@Thor-wl
Copy link
Contributor

Thor-wl commented Jul 14, 2020

@silenceli The above status with 3 job running is the finally status? Can this phenomenon be reproduced stably?

@Thor-wl
Copy link
Contributor

Thor-wl commented Jul 16, 2020

@silenceli please provide the volcano version you are using

@Thor-wl
Copy link
Contributor

Thor-wl commented Jul 24, 2020

fix PR: #949

@hzxuzhonghu
Copy link
Collaborator

fixed by #959 and #966

@silenceli Would you please have a try?

@Thor-wl We need an e2e case to cover this case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants