Skip to content
xiucz edited this page May 12, 2017 · 2 revisions

For gene finding a range of programs are available (Metagene Annotator, MetaGeneMark, Orphelia, FragGeneScan), here we will use Prodigal which is very fast and has recently been enhanced for metagenomics.

They have been constructed with Ray using a kmer of 41 and no scaffolding. Only contigs >= 1000 are in this file. The reason a coassembly is used is that we can get an idea of the entire metagenome over multiple samples. By mapping the reads back per sample we can compare coverages of contigs between samples.

mkdir -p ~/work_dir/prodigal
cd ~/work_dir/prodigal
cp /proj/g2014113/metagenomics/cfa/assembly/baltic-sea-ray-noscaf-41.1000.fa .
prodigal -a baltic-sea-ray-noscaf-41.1000.aa.fa \
         -d baltic-sea-ray-noscaf-41.1000.nuc.fa \
         -i baltic-sea-ray-noscaf-41.1000.fa \
         -f gff -p meta \
         > baltic-sea-ray-noscaf-41.1000.gff

Question: What could be a possible advantage/disadvantage for the assembly process when assembling multiple samples at one time? .. Advantage: more coverage. Disadvantage: more related strains/species makes .. graph traversal harder

Question: Can you think of other approaches to get a coassembly? .. Maybe map contigs against each other in merge them in that way. Preferably .. taking coverages into account

Clone this wiki locally