A simple container to trigger a SLURM job in Rivanna whenever a GitHub repoistory receives a push or releases a version.
To link GitHub and Rivanna, this architecture uses a messaging queueing service, Amazon SQS. Activities in GitHub trigger a message to be send to an SQS queue. Messages in the queue are then picked up by a constantly cycling container in DCOS that is looking for messages. Upon receipt of a message, it gathers specific variables from the message and acts accordingly.
A pull or "polling" design is useful here for two reasons:
- GitHub and Travis-CI sit outside of the UVA networks and cannot directly reach a Rivanna interactive node.
- Should Rivanna be offline (maintenance, updates, etc.), messages in the queue continue to accumulate and can be processed later.
Travis-CI is an easy solution for this step since it can act programmatically with elements of your GitHub
repository and variables related to it (version, committer, commit hash, branch, tag, release, etc.) See the included travis.yml file
for inclusion in your source code repository. That repository can trigger any number of actions using Travis, such as unit tests,
builds, compiles, file shipping (to someplace like S3), as well as sending an SQS message. The aws sqs command in that template
also shows how to pass along custom MessageAttributes.
In order for Travis to send SQS messages you will need three environment variables in the Travis environment:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_DEFAULT_REGION- should be set tous-east-1.
You will also need to set the URL of the Amazon SQS queue in your .travis.yml file.
language: bash
services:
- docker
before_install:
- sudo pip install --upgrade pip
- pip install --user awscli
- export PATH=$PATH:$HOME/.local/bin
install:
aws sqs send-message --queue-url 'https://queue.amazonaws.com/123456789012/queue-name' --message-body 'release' --message-attributes '{"item1": {"StringValue":"this-is-value-1","DataType":"String"}, "item2": {"StringValue":"this-is-value-2","DataType":"String"}, "item3": {"StringValue":"this-is-value-3","DataType":"String"}}' || exit 1;
notifications:
email:
on_success: change
on_failure: always
recipients:
- mst3k@virginia.edu
The container in this repository is designed for multiple uses and can be adapted to do a number of things. But the central idea is (A) to look for messages on a continual basis (see the run command below); (B) pick up and parse a message when one is available in the queue, then do something with that information; and (C) delete the message when B has completed successfully.
In order to run on a continous cycle, the container can be run with this command:
while [ true ]; do /bin/sh /run.sh; sleep 30; doneThe container also requires several variables in order to do its work. These can be set as environment variables within your container platform (DCOS, Kubernetes, Docker Swarm, etc.) or most of them could also be sent as MessageAttributes within the SQS message itself.
AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY- with SQS read/write accessQURL- Amazon SQS message queue URL.USERID- mst3k for UVARIVANNA_SCRIPT- full path to a bash script to be executed. This script should invoke sbatch according to your parameters -- allocation, partition, script to execute, etc.GIT_REPO- the org/repo for GitHub repository "organization/repo-name"- SQS Attribute:
$VERSION- specify release number, i.e. 0.1.0, 1.0.3, 5.0.0, etc.
id_rsaprivate key for Rivanna access (can also be used for pulling from private Git repo. In DCOS this should be passed to the container as a secret.)
Several options are stubbed out in the run.sh script here. A few options:
- Clone the repository and do something with the code.
- Fetch a release.
- Parse variables from GitHub or SQS MessageAttributes.
- Submit a SLURM job via SSH.
An alternate version of this design would be to send SQS messages from a source other than GitHub. For example, in DCOS a Job can be set up that sends messages on an exact timer, i.e. at 2am every day. From there the worker container described above will pick up the message as usual and submit SLURM jobs, etc.
