You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to create a VCF from the subsampled graph to call bubbles relative to the reference genome (CHM13).
3. What actually happened?
The commands run successfully, but I noticed some issue with the resulting VCF file. According to the AT fields in the VCF, some reference alleles traverse nodes previously not traversed by the reference path in the GFA. In other words, nodes appear in these AT traversals that are not annotated as reference nodes in the GFA (missing SN:Z and SO:i tags) and additionally are not contained in the W line of the respective reference path. One such example is the node with ID "8521880". I attached a file containing an example of such a line in the VCF (for node: 8521880). I'm confused how this can happen. Did I do anything wrong?
4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:
not applicable
5. What data and command can the vg dev team use to make the problem happen?
see above (1.)
6. What does running vg version say?
vg version v1.54.0 "Parafada"
Compiled with g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on Linux
Linked against libstd++ 20230528
Built by xian@octo
This looks like the usual confusion about the node ids being different between the GFA (unclipped) and GBZ (clipped to 1024bp). You can generally get the GFA node ids out of vg deconstruct by using its -O option
-O, --gbz-translation Use the ID translation from the input gbz to apply snarl names to snarl names and AT fields in output
A few more details about node clipping in general can be found here:
1. What were you trying to do?
I'm trying to run the haplotype sampling algorithm to create a VCF of the resulting subgraph with vg deconstruct. I used the commands shown below based on (CHM13-based) MC graph "hgsvc3_and_hprc-dec16-mc-chm13" available here: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=pangenomes/scratch/2023_12_16_minigraph_cactus_hgsvc3_and_hprc/, and Illumina short reads for sample HG00733 from the 1000G.
2. What did you want to happen?
I want to create a VCF from the subsampled graph to call bubbles relative to the reference genome (CHM13).
3. What actually happened?
The commands run successfully, but I noticed some issue with the resulting VCF file. According to the AT fields in the VCF, some reference alleles traverse nodes previously not traversed by the reference path in the GFA. In other words, nodes appear in these AT traversals that are not annotated as reference nodes in the GFA (missing SN:Z and SO:i tags) and additionally are not contained in the W line of the respective reference path. One such example is the node with ID "8521880". I attached a file containing an example of such a line in the VCF (for node: 8521880). I'm confused how this can happen. Did I do anything wrong?
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:5. What data and command can the vg dev team use to make the problem happen?
see above (1.)
6. What does running
vg version
say?vcf-line.txt
The text was updated successfully, but these errors were encountered: