Skip to content

zkamvar/SLURM_Array

 
 

Repository files navigation

SLURM_Array

Written by Shawn O'neil, adapted to SLURM by Zhian Kamvar. Original implementation: https://github.com/oneilsh/SGE_Array

Submitting a list of commands as an array job to SLURM. Easily.

Note: This software is intended for use on the HCC infrastructure, and may not work well on other SLURM installations. It comes with no warranty or gaurantee of effectiveness.

Overview

So you have a bunch of commands that you want to run via SLURM, maybe in a text file called commands.txt:

runAssembly sample_117.fasta -o sample_117.fasta.out
runAssembly sample_162.fasta -o sample_162.fasta.out
runAssembly sample_169.fasta -o sample_169.fasta.out
runAssembly sample_30.fasta -o sample_30.fasta.out
runAssembly sample_34.fasta -o sample_34.fasta.out
runAssembly sample_38.fasta -o sample_38.fasta.out
runAssembly sample_47.fasta -o sample_47.fasta.out
runAssembly sample_58.fasta -o sample_58.fasta.out
runAssembly sample_96.fasta -o sample_96.fasta.out

Or maybe you generated your commands from the list of file names with come clever usage of awk or sed (I generated the above with ls -1 *.fasta | awk '{print "runAssembly " $1 " -o " $1 ".out"}' > commands.txt). From there you could just make the file an executable shell script, or even pipe it right into bash with a cat commands.txt | bash. But what if you want to run these via SLURM?

Probably, you should run them as an array job (because using a loop to submit them as individual jobs is NOT GOOD, ok?), but this means using the clunky $SLURM_TASK_ID syntax, which will only take numerals. SLURM_Array is to your rescue: it takes a list of commands (either as a file, or on stdin) and turns them into an array job. Boom.

cat commands.txt | SLURM_Array

# or

SLURM_Array -c commands.txt

# what about?

ls -1 *.fasta | awk '{print "runAssembly " $1 " -o " $1 ".out"}' | SLURM_Array

Reasonable Defaults and Cool Features

By default, each command is run requesting 4 gigs of RAM, and will be killed if it exceeds that. Each command will be killed if it attempts to create a file of more than 500G. The maximum number of commands that can run simultaneously across any number of machines is limited to 50 (to preserve network resources, this can be changed for IO-light commands).

The default log directory (where the qsub submit script is created, and where the stdout and stderr of each command are written) is jYEAR-MON-DAY_HOUR-MIN-SEC_cmd_etal, as in j2015-01-07_16-35-67_runAssembly_etal (and the "cmd" part is even autogenerated from the first word of the first command), so that you can easily organize your log information by time, and move/remove things in an orderly fashion (you know, like to delete all of today's work, rm -rf j2015-01-07*).

Most of this can be changed, here's the help output:

usage: SLURM_Array.py [-h] [-c COMMANDSFILE] [-q QUEUE] [-m MEMORY] [-t TIME]
                      [-l MODULE [MODULE ...]] [-M MAIL] [--mailtype MAILTYPE]
                      [-f FILELIMIT] [-b CONCURRENCY] [-x MAXCOMMANDS]
                      [--duration DURATION] [-P PROCESSORS] [-r RUNDIR]
                      [-w WD] [-H] [--hold] [--hold_jids HOLD_JID_LIST]
                      [--hold_names HOLD_NAME_LIST] [-v] [-d]
                      [--showchangelog]

Runs a list of commands specified on stdin as a SLURM array job. Example
usage: cat `commands.txt | SLURM_Array` or `SLURM_Array -c commands.txt`

optional arguments:
  -h, --help            show this help message and exit
  -c COMMANDSFILE, --commandsfile COMMANDSFILE
                        The file to read commands from. Default: -, meaning
                        standard input.
  -q QUEUE, --queue QUEUE
                        The queue(s) to send the commands to. Default: all
                        queues you have access to.
  -m MEMORY, --memory MEMORY
                        Amount of free RAM to request for each command, and
                        the maximum that each can use without being killed.
                        Default: 4gb
  -t TIME, --time TIME  The maximum amount of time for the job to run in
                        d-hh:mm:ss. Default: 04:00:00
  -l MODULE [MODULE ...], --module MODULE [MODULE ...]
                        List of modules to load after preamble. Eg: R/3.3
                        python/3.6
  -M MAIL, --mail MAIL  Email address to send notifications to. Default: None
  --mailtype MAILTYPE   Type of email notification to be sent if -M is
                        specified. Options: BEGIN, END, FAIL, ALL. Default:
                        ALL
  -f FILELIMIT, --filelimit FILELIMIT
                        The largest file a command can create without being
                        killed. (Preserves fileservers.) Default: 500G
  -b CONCURRENCY, --concurrency CONCURRENCY
                        Maximum number of commands that can be run
                        simultaneously across any number of machines.
                        (Preserves network resources.) Default: 1000
  -x MAXCOMMANDS, --maxcommands MAXCOMMANDS
                        Maximum number of commands that can be submitted with
                        one submission script. If the number of commands
                        exceeds this number, they will be batched in separate
                        array jobs. Default: 900
  --duration DURATION   Duration expected for each of maxcommands to run in
                        d-hh:mm:ss. This will be multiplied by the number of
                        batches needed to run.
  -P PROCESSORS, --processors PROCESSORS
                        Number of processors to reserve for each command.
                        Default: 1
  -r RUNDIR, --rundir RUNDIR
                        Job name and the directory to create or OVERWRITE to
                        store log information and standard output of the
                        commands. Default: 'jYEAR-MON-DAY_HOUR-MIN-
                        SEC_<cmd>_etal' where <cmd> is the first word of the
                        first command.
  -w WD, --working-directory WD
                        Working directory to set. Defaults to nothing.
  -H                    Hold the execution for these commands until you
                        release them via scontrol release <JOB-ID>
  --hold                Hold the execution for these commands until all
                        previous jobs arrays run from this directory have
                        finished. Uses the list of jobs as logged to
                        .slurm_array_jobnums.
  --hold_jids HOLD_JID_LIST
                        Hold the execution for these commands until these
                        specific job IDs have finished (e.g. '--hold_jid
                        151235' or '--hold_jid 151235,151239' )
  --hold_names HOLD_NAME_LIST
                        Hold the execution for these commands until these
                        specific job names have finished (comma-sep list);
                        accepts regular expressions. (e.g. 'SLURM_Array -c
                        commands.txt -r this_job_name --hold_names
                        previous_job_name,other_jobs_.+'). Uses job
                        information as logged to .slurm_array_jobnums.
  -v, --version         show program's version number and exit
  -d, --debug           Create the directory and script, but do not submit
  --showchangelog       Show the changelog for this program.

It can also be used to run non-array jobs (though it will run it as an array of 1 command anyway). The cool thing is that no longer do you have to wrap the command in quotes (unless you are doing funky things like using environment variables right on the command-line, or you want to use | or >, which you'll have to escape).

This means shell autocompletion will work!

echo runAssembly input.fasta -o assembly_output | SLURM_Array

If your command needs funky shell stuff, you'll have to make sure echo can print it properly, by escaping or falling back to using quotes

echo runAssembly input.fasta -o assembly_output	\> log.txt | SLURM_Array
echo 'runAssembly input.fasta -o assembly_output > log.txt' | SLURM_Array

Future Directions

I'd like to have the thing read a config file called .SLURM_Array in $HOME so that the defaults for the options (--path, --queue, --memory etc.) can be adjusted to minimize typing in day-to-day usage.

About

A wrapper script to submit SLURM array jobs

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.5%
  • Makefile 2.0%
  • Shell 1.5%