Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fetch_refseq:40] Error, cannot retrieve reference chr1:0-0. #3

Open
ttriche opened this issue Jan 29, 2016 · 16 comments
Open

[fetch_refseq:40] Error, cannot retrieve reference chr1:0-0. #3

ttriche opened this issue Jan 29, 2016 · 16 comments

Comments

@ttriche
Copy link
Contributor

ttriche commented Jan 29, 2016

After marking dupes, the following

biscuit pileup -q 16 -r hg19.fa -i WGBS_$SAMPLE.mkdups.hg19.bam -o WGBS_$SAMPLE.mkdups.hg19.vcf

fails on a number of samples (4/6, not all though). Is there a straightforward way to debug this? Or just to grep out the offending read[s]? The runs that fail are all from the same flow cell IIRC. I don't see this behavior elsewhere and can't seem to find anything on Google. Eventually I'll look into the source... :-/

@rbpisupati
Copy link

rbpisupati commented May 25, 2016

Hi ttriche, Did you find any way to fix this error?

@zwdzwd
Copy link
Collaborator

zwdzwd commented May 25, 2016

Hi rahbz, are you seeing this? Could you elaborate on how to reproduce? I tried to reproduce this on my side but in vain. I must be missing something.

@rbpisupati
Copy link

rbpisupati commented May 25, 2016

Hi,
I have run it with this command and it ended up giving me an error.

~/Softwares/biscuit-0.1.4.20160330/bin/biscuit pileup -r ~/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam
##fileformat=VCFv4.1
##reference=/home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa
##source=biscuitV0.1.3.20160324
##contig=<ID=Chr1,length=30427671>
##contig=<ID=Chr2,length=19698289>
##contig=<ID=Chr3,length=23459830>
##contig=<ID=Chr4,length=18585056>
##contig=<ID=Chr5,length=26975502>
##contig=<ID=ChrC,length=154478>
##contig=<ID=ChrM,length=366924>
##program=<cmd=biscuit pileup -r /home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa -i 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam>
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=CX,Number=1,Type=String,Description="Cytosine context (CG, CHH or CHG)">
##INFO=<ID=N5,Number=1,Type=String,Description="5-nucleotide context, centered around target cytosine">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##FORMAT=<ID=SP,Number=R,Type=String,Description="Allele support (before bisulfite conversion, with filtering)">
##FORMAT=<ID=CV,Number=1,Type=Integer,Description="Effective (strand-specific) coverage on cytosine">
##FORMAT=<ID=BT,Number=1,Type=Float,Description="Cytosine methylation fraction (aka beta value, with filtering)">
##FORMAT=<ID=GT,Number=1,Type=Integer,Description="Genotype from normal">
##FORMAT=<ID=GL,Number=G,Type=Integer,Description="Genotype likelihoods">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype quality (phred-scaled)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified
[fetch_refseq:62] Error, cannot retrieve reference Chr1:1-0.

But I think I have figured out the error. If I change the reference genome modifying the header of fasta to just the Chromosome name instead of all the information I seemed to get no error. I dont know why that is the case.
So @ttriche, I think you should modify your genome fasta headers to just the chromosome name.

Thanks,
Rahul

@ttriche
Copy link
Contributor Author

ttriche commented May 25, 2016

it disappeared once I updated biscuit and sorted the BAMs (verified sort
order by indexing the result prior to bedgraph/vcf production)

--t

On Wed, May 25, 2016 at 6:12 AM, Wanding Zhou notifications@github.com
wrote:

Hi rahbz, are you seeing this? Could you elaborate on how to reproduce? I
tried to reproduce this on my side but in vain. I must be missing something.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#3 (comment)

@ttriche
Copy link
Contributor Author

ttriche commented May 25, 2016

re: headers: I just used hg19 from UCSC, so that reason seems unlikely

--t

On Wed, May 25, 2016 at 6:33 AM, Rahul Bharadwaj notifications@github.com
wrote:

Hi,
I have run it with this command and it ended up giving me an error.
~/Softwares/biscuit-0.1.4.20160330/bin/biscuit pileup -r
~/whole_genome_TAIR10/TAIR10_all.fa -i
36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified.bam
##fileformat=VCFv4.1
##reference=/home/GMI/rahul.pisupati/whole_genome_TAIR10/TAIR10_all.fa
##source=biscuitV0.1.3.20160324
##contig=
##contig=
##contig=
##contig=
##contig=
##contig=
##contig=
##program=
##INFO=
##INFO=
##INFO=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
36841_GCCAAT_C92VAANXX_1_20160229B_20160229.modified
[fetch_refseq:62] Error, cannot retrieve reference Chr1:1-0.

But I think I have figured out the error. If I change the reference genome
modifying the header of fasta to just the Chromosome name instead of all
the information I seemed to get no error. I dont know why that is the case.
So @ttriche https://github.com/ttriche, I think you should modify your
genome fasta headers to just the chromosome name.

Thanks,
Rahul


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3 (comment)

@rbpisupati
Copy link

Yeah even I thought the same, I have the same headers from Arabidopsis, tair10. Dont know really why that was the case.

@jleluyer
Copy link

jleluyer commented Oct 23, 2016

Hello,

I have the exact same problem even after cloning the latest version of biscuit and sorting/indexing my bam files. Did someone figure out what would be the issue ?

Thanks

@rbpisupati
Copy link

Hello jleluyer,
try to rename headers in your reference fasta file with just chromosome names. it should work then!

@jleluyer
Copy link

Hello,

They are just with scaffold names: i.e >sp_scaff00001. Is the way there are called a problem ?
thanks

@ZhenyuZ
Copy link

ZhenyuZ commented Aug 29, 2017

I got similar error here "[fetch_refcache:94] Error, cannot retrieve reference"
not chr1 though. It errors out in either HPV-mCG2 or HPV-mCG3 or both in the GDC reference genome https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files

Modification of fasta header to only keep the contig name did solve the problem

@minghao4
Copy link

minghao4 commented Mar 9, 2018

Hello, I'm getting a similar error:
Command:
biscuit pileup -g "chr6:1-10000" -o out.vcf GRCh38_p12_chr6.fa input_1_sorted.bam input_2_sorted.bam input_3_sorted.bam input_4_sorted.bam

Error:
[refcache_fetch:85] Error, cannot retrieve reference chr6:0-0

The reference fasta I'm using only contains the GRCh38 assembled chromosome 6. I changed the header from ">CM000668.2 Homo sapiens chromosome 6, GRCh38 reference primary assembly" to just ">chr6" and it still does not resolve the problem.

EDIT: I've figured out the issue. I didn't index the reference fasta again after changing the header.

@smcnew
Copy link

smcnew commented Jul 10, 2019

Hi all: I'm having the same error.
Command:
biscuit pileup gamo_ref2.fa DC050biscuitoutput.bam

Error:
[refcache_fetch:85] Error, cannot retrieve reference Chr1:0-0.

The resulting output vcf has correct headers but no data.
I previously ran Biscuit pileup correctly using a consensus pileup reference created from my reads, I have also run it without problems using a zebra finch genome. However, I recently tried to re-run the analysis using a recently created genome for my species and have been unable to correctly run the pileup command.

Following the advice above in the thread I renamed all the chromosomes to have simpler names: the reference fasta now has the following chromosomes:
`>Chr1

Chr1A
Chr2
Chr3
Chr4
Chr4A
Chr5
Chr6
Chr7
Chr8
Chr9
Chr10
Chr11
Chr12
Chr13
Chr14
Chr15
Chr17
Chr18
Chr19
Chr20
Chr21
Chr22
Chr23
Chr24
Chr25
Chr26
Chr27
Chr28
ChrZ
ChrSS1
ChrSS2
ChrSS3
ChrSS4
ChrUS1
ChrUS2
ChrUS3
ChrUS4
ChrUS5`

I indexed the fasta and aligned my reads again and pileup still will not work.

Any suggestions would be greatly appreciated.

@njspix
Copy link

njspix commented Mar 31, 2023

I have run into this issue quite a bit, oddly, it doesn't seem to be deterministic (sometimes it works, sometimes not). Say I have the following files in my working directory:

sample.bam
ref.fa

if I run biscuit pileup ref.fa sample.bam, I might get the following output:

[fai_load] build FASTA index.
...
[fai_load] build FASTA index.
[refcache_fetch:96] Error, cannot retrieve reference chr1:0-0.

However, if i run samtools faidx in the directory (creating ref.fa.fai), then run biscuit pileup as above, it seems to work without error.

Based on this I'm guessing that the built-in faidx generation in biscuit is buggy (perhaps using an out-of-date htslib function?). Suggest generating a fresh faidx using samtools before running biscuit pileup.

@zwdzwd
Copy link
Collaborator

zwdzwd commented Mar 31, 2023

I don't recall biscuit generates faidx in any of its function. Yeah use samtools.

@njspix
Copy link

njspix commented Mar 31, 2023

Perhaps we should document that a faidx is necessary to run the command, even though it's not specified in the command...

@somnya
Copy link

somnya commented Jan 29, 2024

Thank you @njspix !
I followed your suggestion and it worked for me: Using "samtools faidx" instead of "biscuit index"

I hope this modification can be documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants