Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handling no jobs in a QoS in identify_problems #11

Open
paciorek opened this issue Sep 20, 2023 · 0 comments
Open

handling no jobs in a QoS in identify_problems #11

paciorek opened this issue Sep 20, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@paciorek
Copy link
Collaborator

In this stanza: if pending_job['REASON'] in ['QOSGrpNodeLimit', 'QOSGrpCpuLimit', 'QOSGrpGRES']: in identify_problems, if there are no running jobs in the QoS then we get an error. I.e., a job can't run because of, say, QOSGrpCpuLimit but somehow no other jobs are in the QoS.

This should never happen but was happening for a user, perhaps related to Slurm issues after a downtime.

 Traceback (most recent call last):
 File "sq.py", line 527, in
 File "sq.py", line 466, in display_queued_jobs
 File "pandas/core/frame.py", line 7547, in apply
 File "pandas/core/apply.py", line 180, in get_result
 File "pandas/core/apply.py", line 255, in apply_standard
 File "pandas/core/apply.py", line 284, in apply_series_generator
 File "sq.py", line 418, in inner
 TypeError: sequence item 0: expected str instance, float found

Here's the view from pdb:

Traceback (most recent call last):
  File "sq.py", line 530, in <module>
    if slurm_info.has_current_jobs(username) and (not args.all_jobs):
  File "sq.py", line 470, in display_queued_jobs
    df['PROBLEMS'] = df.apply(identify_problems(slurm_info), axis=1)
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/frame.py", line 7547, in apply
    return op.get_result()
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/apply.py", line 180, in get_result
    return self.apply_standard()
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/apply.py", line 255, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/global/home/users/paciorek/.conda/envs/sq/lib/python3.8/site-packages/pandas/core/apply.py", line 284, in apply_series_generator
    results[i] = self.f(v)
  File "sq.py", line 422, in inner
    qos_running_jobs_str = ', '.join(qos_running_jobs['JOBID'] + ' (' + qos_running_jobs.apply(lambda x: filter_keys(qos_resource_limit, parse_tres_queue_job(x)), axis=1).apply(display_grp_tres) + ')')
TypeError: sequence item 0: expected str instance, float found

There's a problem with the .join because of the exact structure of the inputs when there are no jobs.

I have a freeze directory for this example.

@paciorek paciorek added the bug Something isn't working label Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant