-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IUPAC codes in VCF REF field #268
Comments
As you noted in ACEnglish/kanpig#1, the error is being triggered because the input is invalid. Reference bases must be I think the solution is to integrate an ambiguity base resolver into noodles-vcf's record reference bases writer. This likely covers the vast majority of "invalid" reference bases. Although this behavior is undefined in VCF < 4.3, it is valid to apply to those versions.
This isn't really ideal because the given record's in-memory format is opaque, i.e., records are implementations of
For this approach, my recommendation would be to create a wrapper around
|
Hello,
First of all, thank you for supporting this package. I've had fun using it.
I received an issue on my tool that uses
noodles-vcf
in which variant records with IUPAC codes in the REF were being thrown out. In an attempt to fix the problem I started by moving to the latest version ofnoodles-vcf
(0.57) and found that the reading error was no longer being thrown.However, when attempting to write the record back out, the error is being thrown. This would be fine except noodles writes partial lines in the output VCF which are truncated at the offending bases. For example:
(note that the rest of the entry @ 24408077 was written, just truncated here for readability)
Is it possible for noodles-vcf to raise the error before writing in order to prevent the corrupted lines? My idea is that I can catch the error, attempt to fix the VCF entry, and retry writing.
I think no is a more than fair answer to this request. If so, I'd just need to validate all entries before trying to write, which would incur some overhead doing redundant checks to noodles.
If the answer is yes, but you don't have time, I'm willing to try creating a pull request. My idea would be to edit
io/writer/record
so that thewrite_*
calls are sent a temporary buffer instead of the final writer. Then, at the end of the method the buffer is sent to the final writer. However, I'm not familiar enough with the code base to know if this would be sufficient. So any guidance on what all would need to change would be helpful.The text was updated successfully, but these errors were encountered: