In the pitfalls paper, we demonstrated the benefits of using positive controls in shotgun metagenomics sequencing for high-volume analysis. In particular, we used a marine Vibrio species (Vibrio campbellii) for all of our HiSeq runs at PCMP. Due to the high sequencing coverage and low diversity of the positive control samples, we are able to de novo assemble the Vibrio campbellii genome, and study the distributuion of the SNPs for the same genome being sequenced multiple times.
MAG: metagenome-assembled genomes
In this repo, we start with assembled and taxonomically annotated contigs from sunbeam pipeline:
-
extract V. campbellii contigs using sbx_contigs.
-
assess the draft assemblies quatlify using checkm.
-
pangenome analysis using roary.
-
extract core genes using in-house R script
-
calculate SNPs for each core gene using snp-sites
-
muscle (?) or at least subset muscle
-
calcuate hamming distance
conda env update --name=coresnps --quiet --file env.yml
Checkm requires python2.7 and we also take care of the precalcualted checkm-database in the checkm_dataset
rule.
snakemake --configfile config.yml _run_checkm --use-conda --cores 8
snakemake --configfile config.yml run_roary --cores 8