mabs

Basic Setup

git clone https://github.com/waglecn/mabs.git

Conda and snakemake

Miniconda available from: https://docs.conda.io/en/latest/miniconda.html

Python 3.8.3 Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh  
bash Miniconda3-latest-Linux-X86_64.sh
conda env create --name mabs --file environment.yaml
conda activate mabs

- note the version of python installed in the the mabs environment is not necessarily the same as the default miniconda python version
- asking for ete3 in the default environment will required python 3.6 (200921)

Required files:

GATK3 jar file
- available from https://console.cloud.google.com/storage/browser/gatk-software/package-archive/gatk
- used '''GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2'''
- see config.yaml
adapters for trimming - see config.yaml
- look for adapter files bundled with trimmomatic, ie.

locate TruSeq3-PE.fa

Kraken database ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/

wget ftp://ftp.ccb.jhu.edu/pub/data/kraken2_dbs/minikraken_8GB_202003.tgz

How to run

snakemake --configfile config.yaml --cores 8 --use-conda --conda-prefix /path/to/.snakemake/conda

Use config.default.yaml as a template for other config files.

Notes

200915

strange bug causing infinite loop in snakemake downloading refseq genomes. I think this is because of the dynamic() output/input in rules. Checking this out, seeing if the bug happens if I run entire pipeline from scratch.

200917

noticed a bug in running shovill, increased expected memory usage. Shovill version 0.9.0 running from an older miniconda. Removed miniconda, started from scratch, and pinned Shovill 1.1.0 in shovill.yaml
after fixing, rerunning seems to work with example data, then works after deleting the mashtree and refseq_download directories.

210302

on vs masking before gubbins vs after see nickjcroucher/gubbins#275

TODO 200902

TODO 200911

add trimming parameters to config file - 200921

TODO 200914

sub-species type assemblies are hard-coded in scripts/tree_MRCA.py, it would be useful for these to be configurable but adds layers of complexity to snakefile

TODO 200920

Added GATK info to REQUIREMENTS, and config.yaml

TODO 200926

Tune variant filtering
TODO big question here - use stats from part 1 to make new sample_sheet with QC pass samples? No
- make list to prune from SNP alignment - not needed 201012
need separate list of in-complete genomes, as MRCA-guided MLST didn't work as expected, tree has wrong structure (samples from pt 29 should be mmas) - Fixed 201006, need to convert gbff files before mashtree can read

TODO 201010

start density filter
merge completed results without recalculating shovill assemblies for old samples - 201010
merge 0-coverage bed files and PE_PPE bed files 201013
filter merged bed from vcf
- compress vcf with bcftools

TODO 201013

complete density filter - 20-11-23

TODO 201015

incorporate https://github.com/phac-nml/mab_mabscessus 211021

210323

merging script
copy results_folder1 and results_folder2 into results_merge folder
remove the gubbins folder
remove the SNP_phylo folder
remove the files in MRCA_ref_folder, but keep the individual reference sub-folders
remove the mashtree folder

run snakemake with the following targets, in this order:

mashtree/assembly_mashtree.complete.tree
stage1

touch ./MRCA_ref_mapping//tempRGSC.merged..sorted.bam.bai touch ./MRCA_ref_mapping//.intervals touch ./MRCA_ref_mapping//.RG_SC_RA.bam touch ./MRCA_ref_mapping//.RG_SC_RA.mpileup touch ./MRCA_ref_mapping//.RG_SC_RA_filter.vcf.gz touch ./MRCA_ref_mapping//.RG_SC_RA_filter.failed.vcf.gz touch ./MRCA_ref_mapping//.RG_SC_RA_filter.AD_failed.vcf.gz touch ./MRCA_ref_mapping//.RG_SC_RA_filter.hvar.vcf.gz touch ./MRCA_ref_mapping//.RG_SC_RA.0cov.bed touch ./MRCA_ref_mapping//.RG_SC_RA_filter.hvar_DF.bed

stage2
stage3 to generate the merged output (gubbins, SNP phylo, merged beds, etc)

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
conda_envs		conda_envs
resources		resources
scripts		scripts
.gitignore		.gitignore
README.md		README.md
SNP_counts.smk		SNP_counts.smk
Snakefile		Snakefile
config_template.yaml		config_template.yaml
environment.yaml		environment.yaml
samples.test.csv		samples.test.csv
stage0.smk		stage0.smk
stage1.smk		stage1.smk
stage2.smk		stage2.smk

waglecn/mabs

Folders and files

Latest commit

History

Repository files navigation