Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverm filter inverse paired reads #159

Closed
Rridley7 opened this issue Mar 6, 2023 · 4 comments
Closed

Coverm filter inverse paired reads #159

Rridley7 opened this issue Mar 6, 2023 · 4 comments

Comments

@Rridley7
Copy link

Rridley7 commented Mar 6, 2023

Hello,
I had a question about the filtering module of coverm, I'm guessing the answer may have to do with the SAM flags within the files passed to coverm filter. When running coverm filter on a bam file (mapped with only paired reads) with the --inverse --proper-pairs-only flags, I would expect that the output file contains all singletons and unmapped pairs of the original bam file, and any other reads not passing filtering, thus being 'proper unmapped pairs'. However, when I look to output all of these unmapped reads to a fastq file using samtools fastq -1 r1.fq -2 r2.fq, I get an uneven number of reads in the forward and reverse file. Is there a reason for this?

I understand this issue may not be on the side of coverm, but rather on the samtools call, and can ask this elsewhere if that is the case. Thanks!

Example:

coverm filter --inverse --min-read-percent-identity 0.95 --min-read-aligned-percent 0.7 -t 24 --proper-pairs-only -b coverm-genome.S03_5e9863-t_1.fq.ump.bam  -o out_filtered.bam

samtools collate  -O out_filtered.bam | samtools fastq -1 r1.fq -2 r2.fq -

wc -l < r1.fq
#42333736
wc -l < r2.fq
#42487636
@Rridley7 Rridley7 changed the title Coverm filter inverse singletons Coverm filter inverse paired reads Mar 6, 2023
@wwood
Copy link
Owner

wwood commented Mar 7, 2023

Hi,

Good point. The way coverm is interpreting this is that you want a BAM file with only proper pairs that don't pass the --min-read-percent-identity 0.95 --min-read-aligned-percent 0.7 thresholds. It doesn't "invert" the --proper-pairs flag.

I suggest taking a two step approach - grab reads that fail in the same way you did above, and then concatenate unpaired reads and unmapped reads with samtools and its flags for filtering.

Does that make sense? I understand this is a bit confusing, and the documentation could be better - will fix.
ben

@wwood wwood closed this as completed in 3a44f3e Mar 7, 2023
@Rridley7
Copy link
Author

Rridley7 commented Mar 7, 2023

Thanks for this quick and helpful response! I see, I think that makes sense. One follow up question, I guess I would then expect the bam file to still contain a matching number of reads in the forward and reverse file with this proper pairs flag, rather than having a different number between the files. I.e., if only proper pairs are passed to this output bam file, and the bam file contains output of properly paired reads which do not meet the identity and alignment requirements, should the fwd and rev fastq files have the same number of reads? Could you explain this piece a bit more?

@wwood
Copy link
Owner

wwood commented Mar 7, 2023

Good q.

I think what is happening is that they are aligned as a proper pair, but then since you used thresholds which apply to a single side of each pair, then only one of the pair is present in the output BAM. Does that make sense?

I have to admit it has been a while since I looked at that code though. Perhaps it would be worth verifying by inspecting the mappings of now-singleton mappings in the original file?

@Rridley7
Copy link
Author

Rridley7 commented Mar 7, 2023

I see, yes that does make sense. I'll definitely inspect the singletons to see if that is the case. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants