Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coverm filter #40

Closed
473021677 opened this issue Oct 24, 2020 · 6 comments
Closed

coverm filter #40

473021677 opened this issue Oct 24, 2020 · 6 comments

Comments

@473021677
Copy link

Hi ,
I am using "coverm filter" to remove alignments with insufficient identity. The size for the input bam file is 143374586594 bytes, but the size for the output file is only 4129795 bytes. And I have used "coverm contig --methods trimmed_mean" to calculate the mean coverage for each contig based on the small-sized output file. I am not sure if there is something wrong with it. Could you give some suggestions? Thanks

Best regards

@wwood
Copy link
Owner

wwood commented Oct 24, 2020

Sorry I'm not sure of your question, specifically.

If you only aim to get coverage with some alignment thresholding, why can you not just specify them when you run convert contig?

Thanks.

@473021677
Copy link
Author

Sorry, I haven't pasted the complete commands. The command for coverm filter was "coverm filter -b LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.bam -o LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.filtered.bam --min-read-percent-identity 0.95 --min-read-aligned-percent 0.9 -t 20". The command for coverm contig --methods trimmed_mean was "coverm contig --methods trimmed_mean --bam-files LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.filtered.bam -t 20". What I mean was that there shouldn't be such a big change for the size of the bam if I used the 95% nucleic acid identity and 90% alignment fraction to remove alignments with insufficient identity. Thanks.

@wwood
Copy link
Owner

wwood commented Oct 24, 2020 via email

@473021677
Copy link
Author

I have used the bowtie2 to map the metagenomic reads to the 273 prokaryotic genomes with default parameters to generate the sam file. Then I used the commands "samtools view -bS LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sam > LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.bam" and "samtools sort LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.bam -o LYT19_1_bowtie2_final_freephages_prophages_reformat_95-80.sorted.bam" to generated the sorted bam file. I am not sure if there were many unmapped reads in the original file. But when I used the filtered bam file and unfiltered bam file to calculate the mean coverage per contig through "coverm contig --methods trimmed_mean", the results were almost the same and the calculated mean coverage per contig for the filtered bam file was slighhtly less than that for the unfiltered file. Thanks

@wwood
Copy link
Owner

wwood commented Oct 25, 2020

Sorry I'm still confused still where you think the bug is in CoverM - it seems likely there was unmapped reads in the unfiltered bam file which is causing the size discrepancy. You can check with samtools flagstat for instance.

@473021677
Copy link
Author

473021677 commented Oct 26, 2020 via email

@wwood wwood closed this as completed Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants