Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does table_annovar.pl -vcfinput intentionally change character encoding? #41

Closed
tkoomar opened this issue Sep 7, 2018 · 2 comments
Closed

Comments

@tkoomar
Copy link

tkoomar commented Sep 7, 2018

I do not know if this is a feature, bug, or user error. When providing a VCF to ANNOVAR via table_annovar.pl -vcfinput the output VCF has altered character encoding for punctuation that is part of some annotations -- i.e. ";" becomes "\x3b" and "=" becomes "\x3d" (illustrated by excerpted portions of multianno.txt and .vcf files below)

From the VCF specification, these characters are reserved as delimiters and should not appear within individual INFO fields. So, while I can see that this behavior may be intentional, I was unable to determine from documentation if this is indeed the case.

multianno.txt

Chr     Start   End     Ref     Alt     Func.refGene    Gene.refGene    GeneDetail.refGene
1       10469   10469   C       0       intergenic      NONE;DDX11L1    dist=NONE;dist=1405
1       10469   10469   C       G       intergenic      NONE;DDX11L1    dist=NONE;dist=1405 

multianno.vcf

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO 
1       10469   rs370233998     C       *       2026.12 GATKCutoffSNP   ANNOVAR_DATE=2018-04-16;Func.refGene=intergenic;Gene.refGene=NONE\x3bDDX11L1;GeneDetail.refGene=dist\x3dNONE\x3bdist\x3d1405
1       10469   rs370233998     C       G       2026.12 GATKCutoffSNP   ANNOVAR_DATE=2018-04-16;Func.refGene=intergenic;Gene.refGene=NONE\x3bDDX11L1;GeneDetail.refGene=dist\x3dNONE\x3bdist\x3d1405
@kaichop
Copy link
Contributor

kaichop commented Sep 8, 2018 via email

@tkoomar
Copy link
Author

tkoomar commented Sep 9, 2018

Excellent, thank you for the clarification.

A statement of this (perhaps in docs/articles/VCF.md) might be a nice heads up to new users like myself, but any issue one might have with this behavior can be solved easily enough using sed before bgzipping the resulting VCF. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants