Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flags in the info field #25

Closed
abolia opened this issue Mar 2, 2016 · 4 comments
Closed

Flags in the info field #25

abolia opened this issue Mar 2, 2016 · 4 comments

Comments

@abolia
Copy link

abolia commented Mar 2, 2016

Hi Zev,

I am trying to understand the various flags in the info field and I have few questions about some of the flags.

*1) What is the difference between NC and NS? *
To my understanding, both seem to be counting soft clipped reads.
NS means "The number of primary reads supporting with a soft clip at POS" i.e. reads having soft clipping start exactly at the breakpoint.

NC means "The number of soft-clipped segments that were collapsed into the consensus sequence". Does that mean it counts the reads that have soft clipped segment passing over the breakpoint. For example: If " . " denotes the breakpoint here, " r " is read and " s " soft clipped read then reads counted in NC flag should be like this diagram:
.
sssssrrrrr
rrrrrrrsssssssss
sssssrrrrrrrrrrrrrrrrrrrrrrrrrrrr
ssssssssssrrrrrr
rrrssssssssssssssss

First 3 reads have soft-clipping excatly at breakpint so counted in NS. whereas the last two reads have soft clipped segment spanning over the breakpoint.
Also, in my Tx call outputs, I see NS have higher value than NC. That means more reads support exact breakpoint than the ones that have soft-clipped section passing over the breakpoint.

Is my interpretation correct?

*2) What is the difference between NA and NS? *
NA is "The number of reads that support the structural variant listed in ALT". Does this also means the # of reads that have soft clipping at breakpoint. I see this number always have higher value than NS value. Is it because it counts reads that might not pass MQ filter etc. So, NA is total reads supporting breakpoint and NS is number of reads that have passed the threshold filter for MQ, BQ and supports the breakpoint.

Can you correct if I am wrong in my interpretation.

Thank you so much for all your help.
Ashini

@zeeev
Copy link
Owner

zeeev commented Mar 3, 2016

NC is the number of reads soft-clipped at the POS.

    X

ssssrrrrrrrr
sssrrrrrrrrr
ssrrrrrrrrrr

NC is 3 in this case. Three reads soft-clip at the same position.

NS. Is the same as NC on a person by person basis (genotype field).

If you’re only calling one genome NC == NS. If you’re joint calling NC may or may not equal NS.

##FORMAT=<ID=NS,Number=1,Type=Integer,Description="Number of reads with a softclip at POS for individual”>
##INFO=<ID=NC,Number=1,Type=String,Description="Number of soft clipped sequences collapsed into consensus">

I will try to make the docs more clear.

Does this help you?

—Zev

Zev Kronenberg Ph.D.
Phone: 208 629 6224

On Mar 2, 2016, at 1:20 PM, abolia notifications@github.com wrote:

Hi Zev,

I am trying to understand the various flags in the info field and I have few questions about some of the flags.

*1) What is the difference between NC and NS? *
To my understanding, both seem to be counting soft clipped reads.
NS means "The number of primary reads supporting with a soft clip at POS" i.e. reads having soft clipping start exactly at the breakpoint.

NC means "The number of soft-clipped segments that were collapsed into the consensus sequence". Does that mean it counts the reads that have soft clipped segment passing over the breakpoint. For example: If " . " denotes the breakpoint here, " r " is read and " s " soft clipped read then reads counted in NC flag should be like this diagram:
.
sssssrrrrr
rrrrrrrsssssssss
sssssrrrrrrrrrrrrrrrrrrrrrrrrrrrr
ssssssssssrrrrrr
rrrssssssssssssssss

First 3 reads have soft-clipping excatly at breakpint so counted in NS. whereas the last two reads have soft clipped segment spanning over the breakpoint.
Also, in my Tx call outputs, I see NS have higher value than NC. That means more reads support exact breakpoint than the ones that have soft-clipped section passing over the breakpoint.

Is my interpretation correct?

*2) What is the difference between NA and NS? *
NA is "The number of reads that support the structural variant listed in ALT". Does this also means the # of reads that have soft clipping at breakpoint. I see this number always have higher value than NS value. Is it because it counts reads that might not pass MQ filter etc. So, NA is total reads supporting breakpoint and NS is number of reads that have passed the threshold filter for MQ, BQ and supports the breakpoint.

Can you correct if I am wrong in my interpretation.

Thank you so much for all your help.
Ashini


Reply to this email directly or view it on GitHub #25.

@abolia
Copy link
Author

abolia commented Mar 3, 2016

Hi Zev,

Thanks for your reply. I have been calling one single genome (single sample studies for Translocation calling) and I never see NC==NS, which it should be as you mentioned. For example, here are two Tx calls that are true for ALK-EML4 translocation sample.

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ALK_S4.bam
2 29448159 . N TGGTGAACATTTTAATGGTTCTGTAGATACTCTCAACNTCCACTTACNCACTTAAAAGATTACAAATTA . . LRT=0;WAF=.,0.500001,0.500001;GC=0,1;AT=0.996262,0.0598131,0.011215,0.0429907,0,0.00747664,0.00373832,0,0,0,0,0,0.102804,0.0224299,9.38974;CF=0.00186916;CISTART=29448141,29448175;CIEND=42525058,42525310;PU=90;SU=3;CU=94;RD=535;NC=77;MQ=60;MQF=0;SP=21,4,0;CHR2=2;DI=f;END=42525185;SVLEN=13077027 GT:GL:NR:NA:NS:RD 0/1:-3232.83,-370.834,-4158.47:301:234:150:535

2 42525164 . N AACCTTCCCCCCACNAGAGCAGCTGCAGTTNCCNGAGGAGCCCCTGATTCTGCACCTCAGNNNNNNNNNNANNN . . LRT=0;WAF=.,1,1;GC=0,1;AT=1,0.761905,0,0.761905,0,0,0,0,0,0,0,0,0.809524,0,0.368569;CF=0;CISTART=42525162,42525164;CIEND=29448006,29448148;PU=20;SU=0;CU=16;RD=21;NC=16;MQ=60;MQF=0;SP=12,0,0;CHR2=2;DI=b;END=29448078;SVLEN=13077085 GT:GL:NR:NA:NS:RD 1/1:-255,-255,-2.1e-05:0:21:20:21

In the first call: NC=77 , NS=150; 2 call: NC=16, NS=20

Ideally in this case they should be equal. But I don't understand why they are not.

Also, can you please help me also understand the difference between NA and NS.

Thank you so much.
Ashini

@zeeev
Copy link
Owner

zeeev commented Mar 10, 2016

Ashini,

  1. As we talked about here are the metrics that classify NA vs NR
  • same strand
  • soft clipped within 5bp of breakpoint (this is NS)
  • a read pair 2.5 SD outside normal mapping range
  • mate pair mapped to another chromosome
  1. Read depth includes supplementary reads. Only reads with three cigar operations are filtered.

@abolia
Copy link
Author

abolia commented Mar 14, 2016

Thanks Zev. This is very helpful. I don't understand what does "same strand" mean though? Aren't all the read at the break point anyways on same strand. Also I see that NR+NA = RD for most of my cases, which makes sense.

For directionality, the DI field tells if the break point is supported on the 5' of the pileup or 3' end of pileup for the "POS" position. However, is there a way to find out it for the "END" break point too, even if the reciprocal translocation is not called.

Thanks again,
Ashini

@zeeev zeeev closed this as completed May 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants