A python server-client pair for submitting HTCondor glidein jobs on remote batch systems.
As pictured above, pyglidein is used to run glideins on remote sites, adjusting for pool demand automatically. It consists of a server running on the central HTCondor submit machine and a number of clients on remote submit machines. The client will submit glideins which connect back to the central HTCondor machine and advertise slots for jobs to run in. Jobs then run as normal.
RHEL 6 users must first run pip install setuptools==36.8.0
. Version 36.8.0 is the last version with RHEL 6 support.
To install, just run pip install pyglidein
.
Running the server is fairly simple:
$ pyglidein_server -p PORT_NUMBER
This will start the server with default options, with the server listening on the specified port for requests from the client.
Once you're satisfied that the server is working, running it with nohup
is best.
The client can be set up in a number of ways, but simple execution is:
$ pyglidein_client --config=CLUSTER_CONFIG_FILE --secrets=SECRETS_CONFIG_FILE
All settings are stored in the config file. A list of available configuration options can be found here. A list of available secret options can be found here. The important settings are:
[Glidein]
# full server url to jsonrpc
address = SERVER_URL
[Cluster]
# scheduler types (htcondor, pbs, slurm, ...)
scheduler = htcondor
# cmd to submit a job
submit_command = condor_submit
# cmd to determine how many jobs are on the cluster
running_cmd = condor_q|wc -l
# queue limits
max_total_jobs = 1000
limit_per_submit = 50
This is routinely run in a cron
, but can also be run continuously with:
[Glidein]
# run every 60 seconds
delay = 60
Detailed documentation is available here.