Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
NAME dsh - the distributed shell SYNOPSIS dsh [-a] [-e *'command'*] [-f] [-h] [-N *nodegroup1, nodegroup2, ...*] [-w *node1, node2, ...*] [*command*] DESCRIPTION dsh (the distributed shell) is a program which executes a single command on multiple remote machines. It can execute this command in parallel (i.e. on any number of machines at a time) or in serial (by specifying parallel execution of the command on 1 node at a time). It was originally designed to work with rsh, but has full support for ssh and with a little tweaking of the top part of the dsh executable, should work with any program that allows remote execution of a command without an interactive login. dsh takes a command and a set of nodes from the user; it checks the nodes to make sure they can be contacted, and then runs the command on each node. dsh displays the output (STDERR and STDOUT) of each node preceded by its name. The order the nodes are displayed is described in *"A Brief Description of How dsh Removes Duplicates"*, in the SPECIFYING NODES entry elsewhere in this document. dsh can also be run in interactive mode. This mode is entered when the user doesn't specify a command on the command line. In this mode, dsh will present the user with a prompt. Here the user can enter a command and dsh will run the command on the nodes that the user specified when dsh was started. After the command is run, the user will again be presented with a prompt and can enter another command. It is important to note that this mode does not actually log in to all of the nodes: it executes each command separately using the remote execution program specified (typically rsh or ssh). An important consequence of this is that any properties of a login shell, such as setting environment variables and job control, are not present. Although dsh does not actually log in to each node, commands entered at the dsh prompt are not interpreted by the local shell. So entering echo $PATH at the dsh prompt (*dsh> echo $PATH*) is different than entering echo $PATH at the regular command line (*dsh echo $PATH*). In the first case, the command echo $PATH is passed to the remote computer as is, and so the PATH variable refers to the variable on the remote machine. In the second case, the variable PATH is first interpreted by the local shell, which replaces $PATH with the value of PATH on the local machine before the command is passed to the remote machine. To simulate the first case at the regular command line, use the -e switch (*dsh -e 'echo $PATH'*) or quote the variable to avoid interpolation by the local shell (*dsh echo '$PATH'*). So, in short, commands typed at the dsh prompt are executed as if for each node you typed *rsh [remote computer] '[command you enter]'*, where the command you enter is placed in single quotes by dsh. If you chose to add support for the Term-ReadLine-Gnu Perl module during installation, then the dsh prompt has command recall (up and down arrows) and editing (as featured in the GNU readline library). Also, a local command can be executed by placing an exclamation point (!) in front of the command (e.g. *dsh> !ls* runs ls on the local computer). SPECIFYING NODES There are 4 ways to specify where dsh should execute a command: -a This switch adds all the nodes in the file $BEOWULF_ROOT/node_groups/ALL, where ALL is a file specifying one node per line (IP address or hostname). Lines that begin with a # and lines that have only whitespace are ignored. *example:* mtp22@front_end $ cat /usr/local/dsh/node_groups/ALL front_end # Linux nodes linux1 linux2 penguin1 penguin2 # OpenBSD nodes openbsd1 openbsd2 openbsd3 # FreeBSD nodes freebsd1 freebsd2 mtp22@front_end $ dsh -a ls This will run the command ls on front_end, linux1, linux2, penguin1, penguin2, openbsd1, openbsd2, openbsd3, freebsd1, and freebsd2 -N *nodegroup1, nodegroup2, nodegroup3, ...* This switch adds all the nodes in the file $BEOWULF_ROOT/node_groups/nodegroup1 and all the nodes in the file $BEOWULF_ROOT/node_groups/nodegroup2, etc. As with the ALL file, the format of these nodegroup files is one node per line (IP address or hostname). Lines that begin with a # and lines that have only whitespace are ignored. *example:* mtp22@front_end $ cat /usr/local/dsh/node_groups/linux # Linux part of the cluster, running Slackware 8.0 linux1 linux2 penguin1 penguin2 mtp22@front_end $ cat /usr/local/dsh/node_groups/openbsd # OpenBSD part of the cluster, running OpenBSD 2.9 openbsd1 openbsd2 openbsd3 mtp22@front_end $ dsh -N linux, openbsd ls This will run the command ls on linux1, linux2, penguin1, penguin2, openbsd1, openbsd2, and openbsd3 -w *node1, node2, node3, ...* This switch adds the nodes node1, node2, node3, etc. *example:* mtp22@front_end $ dsh -w freebsd1, freebsd2 ls This will run the command ls on freebsd1 and freebsd2 *example:* mtp22@front_end $ dsh -w 10.10.10.1, 10.10.10.2, 10.10.10.3 ls This will run the command ls on 10.10.10.1, 10.10.10.2, and 10.10.10.3 WCOLL If none of the above switches (-a, -N, -w) is specified, dsh defaults to adding the nodes specified in the file pointed to by WCOLL. dsh only looks at this environment variable when none of the 3 switches described above is specified. *example:* mtp22@front_end $ export WCOLL=/home/mtp22/bsd mtp22@front_end $ cat /home/mtp22/bsd openbsd1 openbsd2 openbsd3 freebsd1 freebsd2 mtp22@front_end $ dsh ls This will run the command ls on openbsd1, openbsd2, openbsd3, freebsd1, and freebsd2 Combining -a, -N, and -w You can combine the -a, -N, and -w switches in any order. *example:* mtp22@front_end $ dsh -N linux -w front_end ls This will run the command ls on linux1, linux2, penguin1, penguin2, and front_end Duplicate Nodes dsh has a somewhat complicated way of dealing with duplicate nodes. For example, if the node which has IP address 10.10.10.3 is called linux3 and is aliased to tux3, then entering *dsh -w linux3, tux3 ls* will only execute the command on 10.10.10.3 once. Similarly, *dsh -w linux3, 10.10.10.3 ls* will only execute the command on 10.10.10.3 once. A more complicated example is a computer which has more than one IP address. For example, say front_end has a local IP address 10.10.10.254 and a remote IP address 22.214.171.124. The following commands are equivalent in that they will execute the command ls on front_end only once (the IP address used to contact this node is the first IP address returned by *gethostbyname(3)*): dsh -w front_end ls dsh -w 10.10.10.254 ls dsh -w 126.96.36.199 ls dsh -w front_end, 10.10.10.254 ls dsh -w 10.10.10.254, front_end ls dsh -w front_end, 188.8.131.52 ls dsh -w 184.108.40.206, front_end ls dsh -w front_end, 10.10.10.254, 220.127.116.11 ls dsh -w front_end, 18.104.22.168, 10.10.10.254 ls dsh -w 10.10.10.254, front_end, 22.214.171.124 ls dsh -w 126.96.36.199, front_end, 10.10.10.254 ls In general, dsh will remove duplicates; however, there are cases where dsh won't remove duplicates because it cannot tell from IP addresses alone whether they refer to the same computer (i.e. dsh doesn't do reverse name lookups when looking for duplicates). For example, in the above scenario, the combinations which will result in dsh executing the command more than once are those which start with the two IP addresses: dsh -w 10.10.10.254, 188.8.131.52 ls dsh -w 184.108.40.206, 10.10.10.254 ls dsh -w 10.10.10.254, 220.127.116.11, front_end ls dsh -w 18.104.22.168, 10.10.10.254, front_end ls To see why, refer to the next section. A Brief Description of How dsh Removes Duplicates dsh looks at each node, in the order of -w nodes first, and -a and -N nodegroups second (the order of these last two depending on the order they appear on the command line). For each node, dsh resolves the name to a set of IP addresses using *gethostbyname(3)*; if any of these IP addresses has already been seen, the node is removed from the list (note: in the case of duplicates found in nodes specified by -a and -N, the nodes are not removed from the actual files, they are only removed from the current list, which is stored in RAM). *example:* mtp22@front_end $ dsh -w front_end, 22.214.171.124 ls This will run the command on front_end only once because dsh performs the following duplication analysis: it looks at front_end and resolves it into the IP addresses 10.10.10.254 and 126.96.36.199. Neither of these IP addresses have been seen before, so it doesn't remove front_end from the list. dsh then looks at 188.8.131.52 and resolves it into the IP address 184.108.40.206. This address has been seen before, so it removes 220.127.116.11 from the list. *example:* mtp22@front_end $ dsh -w 10.10.10.254, 18.104.22.168 ls This will run the command on front_end twice because dsh performs the following duplication analysis: it looks at 10.10.10.254 and resolves it into the IP address 10.10.10.254. This IP address hasn't been seen before, so it doesn't remove 10.10.10.254 from the list. dsh then looks at 22.214.171.124 and resolves it into the IP address 126.96.36.199. This address hasn't been seen before, so it doesn't remove 188.8.131.52 from the list. If you're not sure about duplicates, you can use the -q switch to see the list of nodes where dsh would execute the command without actually executing the command. ENVIRONMENT VARIABLES BEOWULF_ROOT This is the directory that dsh uses to search for node groups (specified with the -N switch) and the ALL file (specified with the -a switch). Note that whatever directory is specified in the variable BEOWULF_ROOT must have a subdirectory called node_groups where the various nodegroup files and the ALL file are located. The default value of BEOWULF_ROOT is /usr/local/dsh FANOUT This is the number of nodes to run the command on in parallel. The default value is all of the nodes; however, you may wish to limit this value if you have an extremely large number of nodes or if the machine where you are running dsh from has a small amount of free resources. For each node, 3 processes are forked, so to calculate the number of processes running at one time, multiply FANOUT by 3 and add 1 (for dsh) WCOLL See SPECIFYING NODE GROUPS SWITCHES -N *nodegroup1, nodegroup2, ...* See SPECIFYING NODE GROUPS -a See SPECIFYING NODE GROUPS -w *node1, node2, ...* See SPECIFYING NODE GROUPS -q Lists the nodes where dsh would execute the command without actually executing the command -e *'command'* Executes the command in quotes on all the specified nodes. The reason for this switch is that putting the command to execute at the end of the command line without single quotes can lead to metacharacter issues (see the QUOTING SHELL METACHARACTERS entry elsewhere in this document below). This way of specifying the command avoids this problem. -t *time_in_seconds* Specifies the time to wait for a node to respond before labeling it "unreachable". The default value is 5. Specifying 0 indicates no timeout. -f If this flag is specified, dsh won't prompt the user whether or not to continue if a node is unreachable or refusing a remote connection. dsh will assume it should continue, bypassing the node. This is useful if dsh is run in a non-interactive script. -h Displays a message containing a brief description of the command line switches QUOTING SHELL METACHARACTERS When dsh is used in the form *dsh [options] command*, there is an issue with shell metacharacters. That is, characters in *command* are interpreted by the local shell before they are passed to dsh. For example, *dsh -w 10.10.10.1, 10.10.10.2 find / -name bash > bash.find* will be interpreted as 'run dsh -w 10.10.10.1, 10.10.10.2 find / -name bash and place the results of this program in the file bash.find on the local computer'. This does not run find / -name bash on 10.10.10.1 and 10.10.10.2 and place the results of find in files named bash.find on the remote computers. To do that you would need to type *dsh -w 10.10.10.1, 10.10.10.2 find / -name bash '>' bash.find* to avoid interpretation of > by the local shell. Another way to avoid local metacharacter interpretation is to use the -e switch: *dsh -w 10.10.10.1, 10.10.10.2 -e 'find / -name bash > bash.find'*. I highly recommend using the -e switch to execute commands or running dsh in interactive mode because most modern shells have many metacharacters and quoting these can be tricky even if you do recognize all of them. EXIT CODES If dsh exits cleanly (that is, if dsh is allowed to call exit itself), the exit code of dsh will either be a 0 or a 1. 0 indicates that dsh at least ran the command "rsh (node) command" for every non-duplicate node. Note that an exit code of 0 does not mean that every rsh was successful nor does it mean that the command exited with a code of 0 on the remote computer :-\ . The way dsh executes rsh commands doesn't lend itself to capturing their exit codes. An exit code of 1 indicates that dsh did not attempt to run the command on any remote computers because of an error. WEBSITE http://dsh.sourceforge.net AUTHORS dsh was originally written by Jason Rappleye <firstname.lastname@example.org> at the Center for Computational Research at the University at Buffalo. Version 2.0 was written by Matthew T. Piotrowski <email@example.com> at the Center for Computational Research. Thanks to firstname.lastname@example.org for his contributions to the early stages of version 2.0. This manpage was written by Matthew T. Piotrowski, with help from the OpenSSH manpage (ssh.1), version 1.99, which was used as a reference. SEE ALSO *rsh(1)*, *gethostbyname(3)*, *ssh(1)*