Skip to content
master
Switch branches/tags
icgc_rnaseq_align/cghub_submit/
icgc_rnaseq_align/cghub_submit/
This branch is 27 commits ahead of kellrott:master.

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

README

Scripts and supporting template metadata XML files to generate, 
validate and submit the metadata for PCAWG RNA-Seq realigned BAMs 
to CGHub; and also to upload the BAMs themselves.

This method keeps the experiment.xml and run.xml files the same
as the original ones as nothing relating to the sequencing has changed.
A new analysis.xml will be generated from the template file passed 
in and from substituting in the filename and MD5 checksum which are also passed in.

to run (generically) using the Dockerfile:

sudo docker run -i -t --rm -v /path/to/data:/data -v /path/to/keys/:/keys cghub_submit /bin/bash

generate_and_submit_to_cghub.pl /opt/analysis.pcawg_rnaseq.TOPHAT2_template.xml /data/<bam_list_file> /key/<pcawg_upload_key_file>.key

where /data should be the place where the bams are and /data/<bam_list_file>
should contain at least one line formatted like this:
<original_cghub_analysis_id>TAB</data/bam_file_name>TAB<bam_file_md5_checksum>

Example line:
4a863729-23fd-4d2f-9ff5-e28b43c67de9    /data/test.bam    7e45fb0ef85576fceb97ec89b5894710

note that the 2nd field must reference the mounted filepath on the
docker image (as I have here), not the original full path to the bam
file.

Use can use either analysis.pcawg_rnaseq.TOPHAT2_template.xml or 
analysis.pcawg_rnaseq.STAR_template.xml depending on which aligner was used.

The default CGHub study/project that these are setup to upload to
is "PCAWG_TEST" for the time being.  This can be changed later.

Also /data needs to be writable by the docker instance.

The script will write out an analysis_id map file 
(located at /data/<bam_list_file>.id_map.tsv) which maps original analysis_id
and the newly generated one which is what will identifiy the uploaded file in CGHub.

Here's an actual example which will work on UCSC's pod cluster environment (validates submission
xml, but doesn't actually submit/upload):

sudo docker run -i -t --rm -v /pod/home/cwilks/p/icgc_rnaseq_align:/test -v /pod/home/kellrott/keys/:/keys cghub_submit /bin/bash

generate_and_submit_to_cghub.pl /opt/analysis.pcawg_rnaseq.TOPHAT2_template.xml /test/uuids


If you're not using the Dockerfile, you may find this relevant if running the script:

No special Perl modules are needed and only lxml for Python should be needed.
Both Perl and Python executables are expected to be on the running PATH.

Further, two CGHub programs are required:

1) cgsubmit
2) gtupload

you can download them here:
https://cghub.ucsc.edu/software/submitters.html

The script expects both to be in the running PATH, 
if not, you will need to modify generate_and_submit_to_cghub.pl.