Deployment playbook for the Personal Cancer Genome Reporter: https://github.com/sigven/pcgr
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 108 commits ahead of brainstorm:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
ansible
aws
kubernetes
.gitignore
README.md
requirements.txt

README.md

Personal Cancer Genome Reporter deployment recipes

Introduction

Cancer reporting systems require prepopulating several gigabytes of genomic reference data and provisioning all software pieces, docker containers and configuration.

PCGR eases that, pcgr-deploy simplifies it futher.

This ansible playbook contains tasks to deploy PCGR into Amazon and OpenStack clouds, with HPC-specific tasks added as a module (mainly NFS mounting).

Quickstart

Tweak files ansible/group_vars/all and ansible.site.yml's roles section according to your needs (are you a HPC or AWS user?).

The following lines will install the deployment modules, deploy PCGR and run its built-in example as a validation:

python3 -m venv venv && source venv/bin/activate && pip install ansible
ansible-playbook aws.yaml -e 'ansible_python_interpreter=/usr/bin/python3'
ssh ubuntu@<AWS INSTANCE>
cd /mnt/pcgr
./pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna examples/tumor_sample.COAD.cna.tsv /mnt/pcgr-* output tumor_sample.COAD

Amazon or OpenStack or HPC?

This playbook allows for all of them, it has tested on the Australian NCI supercomputing centre Tenjin private cloud.

The only changes needed are on ansible/group_vars/all as mentioned on the Quickstart and rearranging site.yml so that it includes the hpc role after common and databundle.Then running the playbook in the following way should deploy PCGR in your (OpenStack?) VM:

ansible-playbook site.yml -e 'ansible_python_interpreter=/usr/bin/python3' -i <YOUR CLUSTER IP/HOSTNAME>,

Alternatively, if you have python3 already installed in your virtual environment, instantiating and deploying to OpenStack is as easy as:

ansible-playbook openstack.yml

Assuming you are employed by the University of Melbourne and running on Tenjin, that's all you need to do ;)

(Optional) Amazon: Saving money with Spot instances

The following script included in ansible queries AWS's spot history and determines if the instance we are asking for will be available. For instance, running the script with a 0.08AUD asking price gives us:

python ~/bin/get_spot_duration.py \
	--region ap-southeast-2 \
	--product-description 'Linux/UNIX' \
	--bids c4.large:0.08

That is 168 hours uptime at that particular asking price for ap-southeast-2c, that is ~87% savings at the time of writing this:

$ ./get_spot_duration.sh
Duration    Instance Type    Availability Zone
168.0    c4.large    ap-southeast-2c
108.2    c4.large    ap-southeast-2a
15.7    c4.large    ap-southeast-2b

Kubernetes

Open ended experiment for now, there are some errors that need some attention.

FAQ

ERROR: package is not a legal parameter in an Ansible task or handler is a symptom of a too old ansible version (probably 1.9.x). You need Ansible >=2.x to deploy this.