Skip to content

Commit

Permalink
Merge pull request #35 from mathiasbockwoldt/master
Browse files Browse the repository at this point in the history
Removed hpn-ssh and added queue reason and state description
  • Loading branch information
bast committed Apr 12, 2018
2 parents 45af585 + 7136d4c commit 6c93c38
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 22 deletions.
30 changes: 30 additions & 0 deletions jobs/batch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,33 @@ To find out whether all users within one project share the same priority, run::
For a given account (project) consider the column "RawShares". If the RawShares
for the users is "parent", they all share the same fairshare priority. If it is
a number, they have individual priorities.


Job status descriptions in squeue
=================================

When you run ``squeue`` (probably limiting the output with ``squeue -u <user_name>``), you will get a list of all jobs currently running or waiting to start. Most of the columns should be self-explaining, but the *ST* and *NODELIST (REASON)* columns can be confusing.

*ST* stands for *state*. The most important states are listed below. For a more comprehensive list, check the `squeue help page section Job State Codes <https://slurm.schedmd.com/squeue.html#lbAG>`_.

R
The job is running
PD
The job is pending (i.e. waiting to run)
CG
The job is completing, meaning that it will be finished soon

The column *NODELIST (REASON)* will show you a list of computing nodes the job is running on if the job is actually running. If the job is pending, the column will give you a reason why it still pending. The most important reasons are listed below. For a more comprehensive list, check the `squeue help page section Job Reason Codes <https://slurm.schedmd.com/squeue.html#lbAF>`_.

Priority
There is another pending job with higher priority
Resources
The job has the highest priority, but is waiting for some running job to finish.
QOS*Limit
This should only happen if you run your job with ``--qos=devel``. In developer mode you may only have one single job in the queue.
launch failed requeued held
Job launch failed for some reason. This is normally due to a faulty node. Please contact us via support-uit@notur.no stating the problem, your user name, and the jobid(s).
Dependency
Job cannot start before some other job is finished. This should only happen if you started the job with ``--dependency=...``
DependencyNeverSatisfied
Same as *Dependency*, but that other job failed. You must cancel the job with ``scancel JOBID``.
22 changes: 0 additions & 22 deletions storage/file_transfer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,28 +44,6 @@ Windows users may buy and install
High-performance tools
======================

OpenSSH with HPN
----------------
The default *ssh* client and server on stallo login nodes is the *openssh* package
with applied HPN patches. By using a hpnssh client on the other end of
the data transfer throughput will be increased.

To use this feature you must have a HPN patched openssh version. You can
check if your ssh client has HPN patches by issuing:

::

ssh -V

if the output contains the word "hpn" followed by version and release
then you can make use of the high performance features.

Transfer can then be speed up either by disabling data encryption, AFTER
you have been authenticated or logged into the remote host (NONE
cipher), or by spreading the encryption load over multiple threads
(using MT-AES-CTR cipher).


NONE cipher
-----------
This cipher has the highest transfer rate. Keep in mind that data after
Expand Down

0 comments on commit 6c93c38

Please sign in to comment.