Skip to content

thomasWeise/simpleCluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simpleCluster

Travis CI Build Status

1. Introduction

This is a very trivial job scheduling engine for small clusters. Here, using some sophisticated systems like TORQUE or SLURM is just too complicated, as they require a lot of installation and configuration effort. Instead, we aim to develop a very simple system for distributing jobs that does not require any installation (except for Java) and has one single entry point for job submission and execution. We also do not care about rights management or systems security, as this system is strictly for personal use.

The idea is that all worker computers mount a shared directory under exactly the same path. You should also mount this directory on your own PC. Inside this shared directory, we put the binaries and data of all jobs to be executed. For each job, you could create a directory, put the binary and data and a shell script for executing the job inside. There also is a job queue file to which jobs can get appended and these are then executed by the workers. No rights management, no ssh, no nothing spectacular. Just a simple jar archive for both submitting jobs to the queue and for launching jobs in worker threads. Each job is processed by a single worker thread. No management of parallelism except for that is done, i.e., jobs may be programs that can spawn arbitrarily many own threads. However, you can tag a job as blocksMachine meaning that no other job can be executed in parallel on the same machine.

There is no central scheduling. Instead, the workers will query the job queue file for new tasks. Via a lock file, it is ensured that only one worker can read the queue file at once. For each worker PC, at most one idle thread will be querying the central job queue file. Since there is no central scheduling, we do not need any cluster management software or resource management software.

The whole cluster management software (if we want to call it that) is super-small, just 22 KiB in size. Don't expect any high performance or great scalability. I don't care about that. I just want a simple distributed job executor that only needs a minimum software installation (i.e., Java). It uses a simple text file and a lock file for queue management, so if you have more than 10 or so worker PCs and more than 1000 or so jobs at once, expect a significant performance decrease.

For any more sophisticated cluster usage, e.g., for one that will work with more nodes or has user management or other fancy features, please check this list.

This system has only been tested under Ubuntu Linux.

2. Usage

The concept of our simple scheduler is building on shared directories. All worker computers as well as the job-submitting computers must mount the shared directory of the cluster at exactly the same path the simpleCluster.jar must be executed in this shared directory.

2.1. Job Submission

java -jar simpleCluster.jar submit cmd=COMMAND dir=DIRECTORY [times=TIMES] [blocksMachine]

Enter the command COMMAND to be executed in directory DIRECTORY into the job queue. If times is specified, the command is added TIMES times. Generally, the directory DIRECTORY must exist on all worker computers in the same path. Usually, this would be a shared, mounted directory. The job executors will then execute the shell in that directory and write COMMAND to its stdin. The parameter times allows you to submit the same command a number of times. A job tagged with blocksMachine will block all job execution on one machine. It will only begin executing once all the threads on the machine are idle and no thread will begin or query for a new job until the blocking job is completed. This is intended for jobs that either spawn their own threads or that require lots of memory or do heavy I/O and thus might disturb other jobs running in parallel.

2.2. Job Execution

java -jar simpleCluster.jar run [cores=nCORES] [sh=/path/to/shell]

On each worker PC, you should launch one instance of the job executor. It will start the worker threads that pick up the jobs and execute them one after the other. Via cores, you can define the number of workers to launch. If cores is not specified, the number of workers will be equal to the number of processor cores. If sh is specified, it must be the path to the shell receiving the commands. If sh is not specified, we will use the default shell. For every command received, a new instance of the shell is launched and the command is piped to it. Once the shell has terminated, the worker thread will query for the next command.

2.3. Creating Shared Directory under Ubuntu

Our scheduler is based on the use of shared directories. On a server computer, a directory exists and is shared. The client computers "mount" this shared directory under the same path. All files copied to the shared directory by any actor (clients, workers, servers) who can access it will then be visible to all of them.

First, we create a permanently shared directory on the server, which is nicely discussed on this website and which I summarize here for Ubuntu.

  1. Install samba by doing sudo apt-get install samba samba-common python-glade2 system-config-samba.
  2. Add the directory to the samba configuration, by editing the file /etc/samba/smb.conf and therefore do the following:
  3. do sudo nano /etc/samba/smb.conf
  4. add text like this to the bottom of the file, where /cluster the path to directory to share, cluster is the name under which it will be shared, and USER is the user name that you will use.
[cluster]
   path = /cluster
   writable = yes
   guest ok = yes
   guest only = yes
   read only = no
   create mode = 0777
   directory mode = 0777
   force user = USER
  1. if the directory does not yet exist, create it via sudo mkdir -p /cluster
  2. do sudo chown -R USER /cluster
  3. do sudo chmod -R 0775 /cluster
  4. close nano and store the changes
  5. do sudo service smbd restart

2.4. Mounting Shared Directories under Ubuntu

To permanently mount a shared directory under Linux, proceed as follows.

  1. Create the shared directory on the main file server computer, let's call the share cluster as discussed in the previous section.
  2. On every single of the working computers and on the computer from which you want to submit jobs, proceed as follows:
    1. sudo mkdir -p /cluster (create the local cluster directory)
    2. sudo chown USER /cluster, where user is the user under which the cluster engine is executed
    3. sudo nano /etc/fstab to edit the file system list
    4. add the line //SERVER_IP/cluster /cluster/ cifs guest,username=USER,password=PASSWORD,iocharset=utf8,file_mode=0777,dir_mode=0777,noperm 0 0, where SERVER_IP be the IP-address of the file server, and USER and PASSWORD be the user name and password.
    5. save and exit nano
    6. do sudo mount -a

You now can access the same shared directory, /cluster, from your job submission PC and from all worker PCs. Now you should copy the simpleCluster.jar there. Start it in the Job Execution option on each worker. You can now create a sub-directory for each work job under /cluster and put, say, shell scripts and data in there. From the job submission PC, you can then enqueue these scripts using the Job Submission mode.

3. Licensing

This software is licensed under the GNU General Public License 3.0.

4. Contact

If you have any questions or suggestions, please contact Prof. Dr. Thomas Weise of the Institute of Applied Optimization at Hefei University in Hefei, Anhui, China via email to tweise@hfuu.edu.cn with CC to tweise@ustc.edu.cn.