Skip to content
Dirk edited this page Oct 28, 2015 · 13 revisions

How to use BatchJobs on LIDO

  • Get a LIDO account for yourself. Look here and here
  • Send a mail to Sebastian Krey to get included into our lido-users mailing list. Sometimes Bernd and Michel post relevant infos here. If you have questions, MAIL TO THIS LIST and not to Bernd or individually.
  • Understand how software and modules work on LIDO. Here is the list of installed stuff. Most useful software must be loaded via module commands.
    • module avail lists all available modules for the user.
    • module list lists all currently activated modules for the user.
    • module add modul1 [modul2 ...] activates modul modul1.
    • module remove modul1 [modul2 ...] removes modul modul1.
    • module purge removes all activated modules.
    • To be able to work with the queuing system you have to load the torque and maui modules via module add torque maui On the slaves, modules are loaded via batchjobs_lido.tmpl (see below) automatically, so you don't need to do this. I have these lines in my .bashrc. You probably want to have those as well.
case "`hostname`" in
  lidong[12])
    module add python/2.7.2
    module add torque maui
    module add subversion
    module add git

    module add binutils
    module add gotoblas/shared/64/1.26
    module add gcc/4.8.5
    module add R/3.2.2-gcc48-base

    alias myjobs='qstat -u $USER'
    TERM="xterm-256color"
    ;;
esac
  • Log into Lido head lidong1.itmc.tu-dortmund.de per SSH.
  • Install BatchJobs and BatchExperiments from CRAN.
  • Understand what queues exist on LIDO, what resources exist and so on by reading this wiki page.
  • Read and understand the documentation header of /home/groups/stattmpl/batchjobs_lido.tmpl, to understand what job resources are available and how they work: less /home/groups/stattmpl/batchjobs_lido.tmpl
  • Read the configuration documentation. Then create a valid config file in your home directory, so at ~/.BatchJobs.R. Here is a template:
cluster.functions = makeClusterFunctionsTorque("/home/groups/stattmpl/batchjobs_lido.tmpl")
mail.start = "first+last"
mail.done = "first+last"
mail.error = "all"
mail.from = "<me@lidong1.itmc.tu-dortmund.de>"
mail.to = "<me@statistik.tu-dortmund.de>"
mail.control = list(smtpServer="mail.statistik.tu-dortmund.de")

default.resources = list(
  R = "R-3.2.2-gcc-4.8.5-base",
  modules = "",
  walltime = 3600L,
  memory = 2048L,
  # parcpus is mapped to Torque resource 'nodes', better use this name,
  # so you dont have to change anything when you use our SLURM cluster
  parcpus = 1L
)

staged.queries = TRUE
debug = FALSE

If you want to use event emails, the sender address does not matter and does not need to exist. But your receiver address must be valid of course. I think you need a @statistik mail address. Or figure out which SMTP server to use.

You should probably upgrade the R version in the default resources when LIDO installs new R versions and it should probably correspond to the R version you use on the master node.

  • DO NOT change the first line in the config template above and DO NOT COPY the batchjobs_lido.tmpl to your local home dir or create your own. It is very likely that you do not understand enough details of the system to do this properly. Copying it will prevent you from getting nice updates from Bernd and Michel.
  • Run a simple batchMap example. For the first try you should probably set debug = TRUE in the config, so you can better understand errors. If everything works, set debug back to FALSE.
  • On the bash console, this stuff is useful: * qstat will display all jobs * qstat -u $USER will display your jobs (or define myjobs in .bashrc) * kill_all_jobs will kill ALL of your jobs. It is a bash script by Sebastian Krey in /home/groups/stattmpl/bin. * show-queues displays a nice, alternative status overview of the queues and your jobs. It is not perfect but mainly gets the job done. It is a python script by Bernd in /home/groups/stattmpl/bin. * show-active-users displays a nice, alternative status overview of what users currently do. It is not perfect but mainly gets the job done. It is an R/shell script by Bernd in /home/groups/stattmpl/bin. * You can use the scripts in the bin directory of the group 'stattmpl' (statistic templates) by adding this line to your .bashrc: PATH=$PATH:/home/groups/stattmpl/bin
  • R packages must be installed and managed by yourself.

If you ever need to update the Rmpi package, you should do this:

 wget http://cran.r-project.org/src/contrib/Rmpi_0.6-5.tar.gz
 module add openmpi/ge/gcc4.8.x/64/1.6.4
 R CMD INSTALL Rmpi_0.6-5.tar.gz --configure-args=--with-mpi=/sysdata/shared/sfw/openmpi/gcc4.8.x/64/1.6.4

Of course you need to adjust the names / paths in the last command. Look up the mpi module name in batchjobs_lido.tmpl.

How to use BatchJobs with our SLURM cluster

  • Read the additional documentation provided by Sebastian.
  • Send a mail to Sebastian Krey to get included into our lido-users mailing list and get access to the cluster. Sometimes Bernd and Michel post relevant infos here. If you have questions, MAIL TO THIS LIST and not to Bernd individually.
  • Log into shell.statistik.tu-dortmund.de per SSH.
  • Get an interactive job for a few hours by typing: interactive.
  • Read and understand the documentation header of dortmund_fk_statistik.tmpl, to understand what job resources are available and how they work: less /opt/R/BatchJobs/dortmund_fk_statistik.tmpl
  • Read the configuration documentation. Then create a valid config file in your home directory, so at ~/.BatchJobs.R. Here is a template:
cluster.functions = makeClusterFunctionsSLURM("/opt/R/BatchJobs/dortmund_fk_statistik.tmpl")
mail.start = "first+last"
mail.done = "first+last"
mail.error = "all"
mail.from = "<me@shell>"
mail.to = "<me@statistik.tu-dortmund.de>"
mail.control = list(smtpServer="mail.statistik.tu-dortmund.de")

default.resources = list(
  walltime = 3600L,
  memory = 512L,
  # parcpus is mapped to SLURM resource 'ntasks', better use this name,
  # so you dont have to change anything when you use our LIDO cluster
  parcpus = 1L,
  ncpus = 1L
)

staged.queries = TRUE
max.concurrent.jobs = 450
debug = FALSE

If you want to use event emails, the sender address does not matter and does not need to exist. But your receiver address must be valid of course. You need a @statistik or Unimail address. Alternatively figure out which SMTP server and login data to use for a different mail provider.

  • DO NOT change the first line in the config template above and DO NOT COPY the dortmund_fk_statistik.tmpl to your local home dir or create your own. It is very likely that you do not understand enough details of the system to do this properly. Copying it will prevent you from getting updates from Sebastian.
  • Run a simple batchMap example. For the first try you should probably set debug = TRUE in the config, so you can better understand errors. If everything works, set debug back to FALSE.
  • On the bash console, this stuff is useful:
    • squeue will display all jobs
    • squeue -u $USER will display your jobs
    • kill_all_jobs will kill ALL of your jobs (except for the interactive ones)