properly handle pairing information in surjection #25

ekg · 2015-04-16T10:10:24Z

This is now included in the GAM stream but not handled by surject.

adamnovak · 2015-08-07T18:01:10Z

I encountered this issue today trying to use vg to align to the HGVM bake-off graphs.

From looking at the SAM spec, it seems that it may be legal to set RNEXT to the next read's name while leaving PNEXT as 0 ("unavailable"), since it's potentially expensive to find the other alignment and work out where it mapped.

ekg · 2015-08-08T08:14:10Z

That seems reasonable. We may need to make a second pass to resolve things.
This could possibly be done with
https://github.com/pezmaster31/bamtools/blob/master/src/toolkit/bamtools_resolve.cpp

On Fri, Aug 7, 2015 at 8:01 PM, adamnovak notifications@github.com wrote:

I encountered this issue today trying to use vg to align to the HGVM
bake-off graphs.

From looking at the SAM spec, it seems that it may be legal to set RNEXT
to the next read's name while leaving PNEXT as 0 ("unavailable"), since
it's potentially expensive to find the other alignment and work out where
it mapped.

—
Reply to this email directly or view it on GitHub
#25 (comment).

adamnovak · 2015-08-10T22:50:32Z

Sorry, what I proposed is completely wrong, because I don't understand SAM.

In SAM, the QNAME field specifies the name of the fragment, not of the read. So the two ends of a paired end read are linked together by sharing a QNAME, and there's no need to actually fill in the RNEXT and PNEXT fields unless you want to be efficient.

RNEXT doesn't hold the name of the next read on the fragment, but rather the reference contig against which the next read on the fragment was aligned. If it's the same as the reference for the current fragment it can be "=".

It looks like what really needs to happen is that vg surject needs to strip off the "/1" and "/2" that get added to the ends of fragment names when paired end reads are imported from FASTQ. You would still need another pass for the RNEXT/PNEXT fields and the flags.

Interestingly, right now paired-end information isn't read from BAM, and the "/1" and "/2" aren't added, so if you throw in a paired-end BAM you get multiple VG alignments with the same name.

ekg · 2015-11-02T15:00:58Z

@adamnovak you haven't addressed this have you? Is it still a problem on your end?

adamnovak · 2015-11-16T18:47:26Z

I don't think I actually need to surject paired reads for anything I am doing. @glennhickey may want it if he wants to run a read-pair-aware BAM-based variant caller as a control.

I haven't checked recently, but I think the handling of the BAM fragment names (not giving paired reads the same fragment name) is still wrong. With the right fragment names you can go through and reconstruct the mate-finding fields, but without it you have a BAM file that doesn't express pairing at all.

ekg · 2015-11-16T20:48:43Z

I think the problem is how the reads are pulled out of the bam. The
converter I've used in samtools generates suffixes for the reads depending
on which part of the fragment they represent. vg needs to know to trim the
/1 or /2 from the fragment ends.
On Nov 16, 2015 7:47 PM, "adamnovak" notifications@github.com wrote:

I don't think I actually need to surject paired reads for anything I am
doing. @glennhickey https://github.com/glennhickey may want it if he
wants to run a read-pair-aware BAM-based variant caller as a control.

I haven't checked recently, but I think the handling of the BAM fragment
names (not giving paired reads the same fragment name) is still wrong. With
the right fragment names you can go through and reconstruct the
mate-finding fields, but without it you have a BAM file that doesn't
express pairing at all.

—
Reply to this email directly or view it on GitHub
#25 (comment).

adamnovak · 2017-07-07T17:39:20Z

We now do this.

adamnovak mentioned this issue Mar 21, 2016

vg index: xg construction memory usage increase #267

Closed

adamnovak closed this as completed Jul 7, 2017

ChriKub mentioned this issue Feb 19, 2018

vg call fails: Address not mapped to object #1459

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

properly handle pairing information in surjection #25

properly handle pairing information in surjection #25

ekg commented Apr 16, 2015

adamnovak commented Aug 7, 2015

ekg commented Aug 8, 2015

adamnovak commented Aug 10, 2015

ekg commented Nov 2, 2015

adamnovak commented Nov 16, 2015

ekg commented Nov 16, 2015

adamnovak commented Jul 7, 2017

properly handle pairing information in surjection #25

properly handle pairing information in surjection #25

Comments

ekg commented Apr 16, 2015

adamnovak commented Aug 7, 2015

ekg commented Aug 8, 2015

adamnovak commented Aug 10, 2015

ekg commented Nov 2, 2015

adamnovak commented Nov 16, 2015

ekg commented Nov 16, 2015

adamnovak commented Jul 7, 2017