You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was having problems using a BCF file produced by noodles in bcftools. After a bit of digging, I believe my issue has to do with the handling of the IDX field in the header contig field. To illustrate, running
echo -e '##fileformat=VCFv4.3\n##contig=<ID=1>\n#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tsample'| bcftools view -O b > test.bcf.gz
and then
use std::{error, fs, io};use noodles_bcf as bcf;use noodles_vcf as vcf;fnmain() -> Result<(),Box<dyn error::Error>>{letmut reader = fs::File::open("test.bcf.gz").map(bcf::Reader::new)?;
reader.read_file_format()?;let raw_header = reader.read_header()?;let header: vcf::Header = raw_header.parse()?;letmut writer = vcf::Writer::new(io::stdout());
writer.write_header(&header)?;Ok(())}
prints something like
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed",IDX=0>
##contig=<ID=1,IDX="0">
##bcftools_viewVersion=1.14-25-gc304cb1+htslib-1.14-9-ge769401
##bcftools_viewCommand=view -O b; Date=Tue Jan 4 13:22:23 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
Note in particular that the value "0" of the IDX field on the contig record is quoted ##contig=<ID=1,IDX="0">, since it is encoded as a regular entry of a IndexMap<String, String> rather than as a particular Option<usize> field as used e.g. for filters.
When writing BCFs with noodles, this causes issues when interfacing e.g. with bcftools. Modifying the above slightly to write a BCF,
Code
use std::{error, fs, io};use noodles_bcf as bcf;use noodles_vcf as vcf;fnmain() -> Result<(),Box<dyn error::Error>>{letmut reader = fs::File::open("test.bcf.gz").map(bcf::Reader::new)?;
reader.read_file_format()?;let raw_header = reader.read_header()?;let header: vcf::Header = raw_header.parse()?;letmut writer = bcf::Writer::new(io::stdout());
writer.write_file_format()?;
writer.write_header(&header)?;Ok(())}
and running for instance
cargo run | bcftools view
produces
[W::bcf_hdr_register_hrec] Error parsing the IDX tag, skipping
##fileformat=VCFv4.3
##FILTER=<ID=PASS,Description="All filters passed">
##bcftools_viewVersion=1.14-25-gc304cb1+htslib-1.14-9-ge769401
##bcftools_viewCommand=view -O b; Date=Tue Jan 4 13:22:23 2022
##bcftools_viewCommand=view; Date=Tue Jan 4 13:38:29 2022
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
where the contig entry has been skipped due to an error parsing the string IDX tag.
Digging around in the BCF specs sections 6.2.1-6.2.2, it's not clear to me how, if at all, the IDX field should work for contigs. My impression from reading samtools/hts-specs#591, however, is that contig IDXs should somehow be handled, albeit separately from other header records? I don't understand the details of the BCF spec well enough to be sure that I completely follow that thread, however.
Feel free to close if you think this is a problem in bcftools or with the spec, rather than with noodles - I figured either way it might save someone else a bit of time repeating the above investigation if there's a searchable issue describing the problem.
The text was updated successfully, but these errors were encountered:
Thanks for the detailed report. I see the issue here.
It was an unfortunate decision to entangle the format/spec with an implemntation detail. A few changes will have to be made to provide compatibility with htslib.
bcf::header::StringMap needs to be reworked to allow nonsequential entries. This means it can no longer be backed by an IndexSet. (4291d88 and 4e39a77)
A StringMap will need to be created for contig header records. Given that absolute positions can override relative positions, the order of vcf::header::Contigs cannot be used. (17397d7)
Document that a record chromosome ID (chrom) is not an index into vcf::header::Contigs but an index into the contig StringMap (6b73f93).
Writing a VCF record needs to use the contig StringMap instead of the VCF header contigs (9da12e7).
Converting a BCF record to BCF record needs to the contig StringMap instead of the VCF header contigs (9c97ba8).
Parse IDX field from raw record in vcf::header::Contig (22046f0).
noodles 0.17.0 includes the changes that resolve this issue, mainly parsing the IDX field on VCF header contig records and also building a string map for those records.
I was having problems using a BCF file produced by
noodles
inbcftools
. After a bit of digging, I believe my issue has to do with the handling of theIDX
field in the header contig field. To illustrate, runningand then
prints something like
Note in particular that the value
"0"
of theIDX
field on the contig record is quoted##contig=<ID=1,IDX="0">
, since it is encoded as a regular entry of aIndexMap<String, String>
rather than as a particularOption<usize>
field as used e.g. for filters.When writing BCFs with
noodles
, this causes issues when interfacing e.g. withbcftools
. Modifying the above slightly to write a BCF,Code
and running for instance
cargo run | bcftools view
produces
where the contig entry has been skipped due to an error parsing the string IDX tag.
Digging around in the BCF specs sections 6.2.1-6.2.2, it's not clear to me how, if at all, the IDX field should work for contigs. My impression from reading samtools/hts-specs#591, however, is that contig IDXs should somehow be handled, albeit separately from other header records? I don't understand the details of the BCF spec well enough to be sure that I completely follow that thread, however.
Feel free to close if you think this is a problem in
bcftools
or with the spec, rather than withnoodles
- I figured either way it might save someone else a bit of time repeating the above investigation if there's a searchable issue describing the problem.The text was updated successfully, but these errors were encountered: