Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

properly handle pairing information in surjection #25

Closed
ekg opened this issue Apr 16, 2015 · 7 comments
Closed

properly handle pairing information in surjection #25

ekg opened this issue Apr 16, 2015 · 7 comments

Comments

@ekg
Copy link
Member

ekg commented Apr 16, 2015

This is now included in the GAM stream but not handled by surject.

@adamnovak
Copy link
Member

I encountered this issue today trying to use vg to align to the HGVM bake-off graphs.

From looking at the SAM spec, it seems that it may be legal to set RNEXT to the next read's name while leaving PNEXT as 0 ("unavailable"), since it's potentially expensive to find the other alignment and work out where it mapped.

@ekg
Copy link
Member Author

ekg commented Aug 8, 2015

That seems reasonable. We may need to make a second pass to resolve things.
This could possibly be done with
https://github.com/pezmaster31/bamtools/blob/master/src/toolkit/bamtools_resolve.cpp

On Fri, Aug 7, 2015 at 8:01 PM, adamnovak notifications@github.com wrote:

I encountered this issue today trying to use vg to align to the HGVM
bake-off graphs.

From looking at the SAM spec, it seems that it may be legal to set RNEXT
to the next read's name while leaving PNEXT as 0 ("unavailable"), since
it's potentially expensive to find the other alignment and work out where
it mapped.


Reply to this email directly or view it on GitHub
#25 (comment).

@adamnovak
Copy link
Member

Sorry, what I proposed is completely wrong, because I don't understand SAM.

In SAM, the QNAME field specifies the name of the fragment, not of the read. So the two ends of a paired end read are linked together by sharing a QNAME, and there's no need to actually fill in the RNEXT and PNEXT fields unless you want to be efficient.

RNEXT doesn't hold the name of the next read on the fragment, but rather the reference contig against which the next read on the fragment was aligned. If it's the same as the reference for the current fragment it can be "=".

It looks like what really needs to happen is that vg surject needs to strip off the "/1" and "/2" that get added to the ends of fragment names when paired end reads are imported from FASTQ. You would still need another pass for the RNEXT/PNEXT fields and the flags.

Interestingly, right now paired-end information isn't read from BAM, and the "/1" and "/2" aren't added, so if you throw in a paired-end BAM you get multiple VG alignments with the same name.

@ekg
Copy link
Member Author

ekg commented Nov 2, 2015

@adamnovak you haven't addressed this have you? Is it still a problem on your end?

@adamnovak
Copy link
Member

I don't think I actually need to surject paired reads for anything I am doing. @glennhickey may want it if he wants to run a read-pair-aware BAM-based variant caller as a control.

I haven't checked recently, but I think the handling of the BAM fragment names (not giving paired reads the same fragment name) is still wrong. With the right fragment names you can go through and reconstruct the mate-finding fields, but without it you have a BAM file that doesn't express pairing at all.

@ekg
Copy link
Member Author

ekg commented Nov 16, 2015

I think the problem is how the reads are pulled out of the bam. The
converter I've used in samtools generates suffixes for the reads depending
on which part of the fragment they represent. vg needs to know to trim the
/1 or /2 from the fragment ends.
On Nov 16, 2015 7:47 PM, "adamnovak" notifications@github.com wrote:

I don't think I actually need to surject paired reads for anything I am
doing. @glennhickey https://github.com/glennhickey may want it if he
wants to run a read-pair-aware BAM-based variant caller as a control.

I haven't checked recently, but I think the handling of the BAM fragment
names (not giving paired reads the same fragment name) is still wrong. With
the right fragment names you can go through and reconstruct the
mate-finding fields, but without it you have a BAM file that doesn't
express pairing at all.


Reply to this email directly or view it on GitHub
#25 (comment).

@adamnovak
Copy link
Member

We now do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants