Skip to content

Chr21 simulation experiment on AWS

Glenn Hickey edited this page Jun 8, 2018 · 9 revisions

This sequence of commands will construct graphs, simulate reads, and generate various mapping and calling ROC plots on chromosome 21 for sample HG00096. Other chromosomes and samples (from 1kg or HG002) can be used by replacing each occurrence below.

 #Create an EC2 leader node on which to run all toil-vg commands:
 scripts/create-ec2-leader.sh leader my-keypair-name

 #Construct thread graphs from which to simulate from
 scripts/construct-hs37d5-ec2.py my-job-store my-out-store --leader leader --chroms 21 --sample HG002 --haplo_graph --xg --out_name baseline

 #Construct test graphs to use for mapping
 scripts/construct-hs37d5-ec2.py my-job-store my-out-store --leader leader --chroms 21 --control HG00096 --gcsa --xg

 #Simulate ~50X coverage reads from the thread graphs
 scripts/sim-ec2.py my-job-store my-out-store s3://my-out-store/baseline_HG00096_haplo 65000000 --leader leader

 #Simulate ~50X coverage reads from the thread graphs with no errors
 scripts/sim-ec2.py my-job-store my-out-store  s3://my-out-store/baseline_HG00096_haplo 6500000 --leader leader --sim_opts "-p 570 -v 65 -S 0 -i 0 -I"

 #Run mapping evaluation on the error-free reads
 scripts/mapeval-ec2.py my-job-store my-out-store snp1kg_21 baseline_HG00096_21_HG00096_haplo_sim_6.5M_trained-p570-v65-S0-i0-I.gam --leader leader --fasta s3://cgl-pipeline-inputs/vg_cgl/HS37D5/HS37D5_chr21.fa --names primary HG00096 --outname trained_no_error

 #Run calling evaluation on the error-free reads
 scripts/calleval-ec2.py my-job-store my-out-store snp1kg_21 --leader leader --fasta s3://cgl-pipeline-inputs/vg_cgl/HS37D5/HS37D5_chr21.fa --truth s3://my-out-store/ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes_HG00096.vcf.gz --chroms 21 --names primary HG00096 --outname trained_570_65 --chroms 21

 #Terminate the leader
 toil destroy-cluster leader