Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whatshap haplog with a cram file #425

Closed
asylvz opened this issue Dec 20, 2022 · 1 comment
Closed

whatshap haplog with a cram file #425

asylvz opened this issue Dec 20, 2022 · 1 comment

Comments

@asylvz
Copy link

asylvz commented Dec 20, 2022

Hello,

With "whatshap haplog" using a cram file, I got the following error.

The reference is the one used with read alignment and the md5sums of cram header and the reference genome matches. Then we found out that there is a similar issue here: samtools/htslib#1015

I solved it by two different ways:

  1. seq_cache_populate.pl -root /xxx/phasing /xxx/phasing/GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta

export REF_PATH=/xxx/phasing/%2s/%2s/%s:http://www.ebi.ac.uk/ena/cram/md5/%s
export REF_CACHE=/xxx/phasing/%2s/%2s/%s

  1. Converting CRAM file to BAM

We discussed this with @tobiasmarschall and decided to open an issue to suggest outputting a more explanatory error message for such an issue.

Thank you,
Arda

whatshap haplotag NA12878_longread.vcf.gz NA12878.cram -o NA12878_longread_phase.bam --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta

Found 3 samples in input VCF
Keeping 3 samples for haplo-tagging
Found 1 samples in BAM file
Reading alignments for sample 'NA12878' and detecting alleles ...
Found 222449 reads covering 167992 variants
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd": Protocol not supported
[E::cram_get_ref] Failed to populate reference for id 0
[E::cram_decode_slice] Unable to fetch reference #0 10001..425038
[E::cram_next_slice] Failure to decode slice
Traceback (most recent call last):
 File "/software/WhatsHap/2020_03_02/login/bin/whatshap", line 8, in <module>
  sys.exit(main())
 File "/software/WhatsHap/2020_03_02/login/lib/python3.6/site-packages/whatshap/__main__.py", line 83, in main
  module.main(args)
 File "/software/WhatsHap/2020_03_02/login/lib/python3.6/site-packages/whatshap/cli/haplotag.py", line 622, in main
  run_haplotag(**vars(args))
 File "/software/WhatsHap/2020_03_02/login/lib/python3.6/site-packages/whatshap/cli/haplotag.py", line 574, in run_haplotag
  for alignment in bam_reader.fetch(contig=chrom, start=start, stop=end):
 File "pysam/libcalignmentfile.pyx", line 2092, in pysam.libcalignmentfile.IteratorRowRegion.__next__
OSError: truncated file
@marcelm
Copy link
Contributor

marcelm commented Jan 9, 2023

This is actually not just a bad error message, but a bug because haplotag should just use the provided FASTA file instead of trying to retrieve the sequences from REF_PATH.

While fixing this, I noticed that CRAM files with unmapped reads will lead to a crash at the moment. That’s a different issue that needs to be fixed separately.

@marcelm marcelm closed this as completed in 32e03b3 Jan 9, 2023
marcelm added a commit that referenced this issue Jan 9, 2023
This should make haplotagging CRAM files work.

Closes #425
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants