Skip to content

Commit

Permalink
Merge pull request #36 from bast/radovan/trap
Browse files Browse the repository at this point in the history
show how to clean up a timed-out job
  • Loading branch information
Mathias Bockwoldt committed Apr 15, 2018
2 parents 6c93c38 + 841f6a5 commit 29a91b8
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 0 deletions.
13 changes: 13 additions & 0 deletions jobs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,19 @@ Example on how to allocate entire memory on one node
:language: bash


How to recover files before a job times out
-------------------------------------------

Possibly you would like to clean up the work directory or recover
files for restart in case a job times out. In this example we ask Slurm
to send a signal to our script 120 seconds before it times out to give
us a chance to perform clean-up actions.

.. literalinclude:: files/slurm-timeout-cleanup.sh
:language: bash



OpenMP and MPI
==============

Expand Down
36 changes: 36 additions & 0 deletions jobs/files/slurm-timeout-cleanup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash -l

# job name
#SBATCH --job-name=example

# replace this by your account
#SBATCH --account=...

# one core only
#SBATCH --tasks=1

# we give this job 4 minutes
#SBATCH --time=0-00:04:00

# asks SLURM to send the USR1 signal 120 seconds before end of the time limit
#SBATCH --signal=B:USR1@120

# define the handler function
# note that this is not executed here, but rather
# when the associated signal is sent
your_cleanup_function()
{
echo "function your_cleanup_function called at $(date)"
# do whatever cleanup you want here
}

# call your_cleanup_function once we receive USR1 signal
trap 'your_cleanup_function' USR1

echo "starting calculation at $(date)"

# the calculation "computes" (in this case sleeps) for 1000 seconds
# but we asked slurm only for 240 seconds so it will not finish
# the "&" after the compute step and "wait" are important
sleep 1000 &
wait

0 comments on commit 29a91b8

Please sign in to comment.