Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of variation causes segfault in polyphase #496

Closed
jan-glx opened this issue Nov 2, 2023 · 2 comments · Fixed by #497
Closed

Lack of variation causes segfault in polyphase #496

jan-glx opened this issue Nov 2, 2023 · 2 comments · Fixed by #497

Comments

@jan-glx
Copy link
Contributor

jan-glx commented Nov 2, 2023

Whatshap polyphase segfaults if a chromosome contains no variants but none of them are phased (or something like that).
reproducible example (Same data, reference and locus as in #493, different vcf file)

wget https://hgdownload.cse.ucsc.edu/goldenpath/mm10/chromosomes/chr19.fa.gz && gunzip chr19.fa.gz
samtools faidx chr19.fa
bind 'set disable-completion on'
cat <<'EOF' > unphased_freebayes.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##contig=<ID=chr19,length=61431566>
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	CGAAGAGGTAGGTGCGAG-1
chr19	55910646	.	ACAAATCCCCCATC	AAATC,ACAAATCCC	1486.28	.	AC=1,1	GT:DP:AD	1/2:83:0,35,44
EOF

cat <<'EOF' > reads.sam
@HD	VN:1.4	SO:coordinate
@SQ	SN:chr19	LN:61431566
@RG	ID:CGAAGAGGTAGGTGCGAG-1	SM:CGAAGAGGTAGGTGCGAG-1
@PG	ID:bwa	VN:0.7.12-r1039	CL:bwa mem -C -M -t 32 -H .sam.header .fa .fastq.gz .fastq.gz	PN:bwa
A01382:376:H5HVGDRX3:1:2131:30807:22842	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTCTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C54T39	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:11	MQ:i:60	AS:i:135	XS:i:19
A01382:376:H5HVGDRX3:1:2252:12048:7200	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:1:2264:23258:10457	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF,FFFFFFFFFFFFFF:FFFFFFF:FFFF:FFF:,FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF:FF,FFFFFFFFFFF,FFF:F:FF:FFFFFF:FFFFF:F:,:FF,FFFF:FFFFFFF:FFFFFF:FFFF,FFFFFFFF:FFF,FFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2105:19253:5024	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGCACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF:F:FFF::FFFFFFFFFFFFFFF:FFF:FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C18T75	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:11	MQ:i:60	AS:i:135	XS:i:19
A01382:376:H5HVGDRX3:2:2105:19271:5431	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGCACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C18T75	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:11	MQ:i:60	AS:i:135	XS:i:19
A01382:376:H5HVGDRX3:2:2167:16938:3302	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGACATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTGGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FF:FFFFF,F:FFFFFFFFFFF::FF:FFFFFFF,F,,FFFFFFFFFFFFFFFF,FF:FFFF,FFFFF:FFFFFFFFFFF:FFF,FF:FFFFFFFFFF:FFFFFFFF:::FFFFFFFF:FFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:25T35A3^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:12	MQ:i:60	AS:i:130	XS:i:20
A01382:376:H5HVGDRX3:2:2231:9245:11741	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2236:29125:20431	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFF,F,F:,FFFFFFF:FFFFFFF:F:FFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF,:,:FFFFFFFFF:FFFFFFFFFFFFFFF:FFFF::FFFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2259:3025:9549	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF,FFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:2:2274:11460:6934	147	chr19	55910582	60	65M9D95M	=	55910535	-216	TCCCAAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGAAATCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF	MC:Z:109M	MD:Z:65^CAAATCCCC0C94	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:10	MQ:i:60	AS:i:140	XS:i:20
A01382:376:H5HVGDRX3:1:2247:22309:12164	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2137:18249:27336	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	:FFFFFFFFFFFFFFFFFFFFFF,FFFFFFFF:F:,FFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,F:FFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2206:5638:6026	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF:,,F:FF:F,FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2227:24740:17628	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF::F,FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F,FFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2252:15700:25144	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:5	MQ:i:60	AS:i:149	XS:i:20
A01382:376:H5HVGDRX3:2:2255:32018:34898	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAACCCCCGCTAGGATGGTTGGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	,:FFFFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFF:F,:FFFF:FF	MC:Z:109M	MD:Z:65T2^CCCAT14A77	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:7	MQ:i:60	AS:i:139	XS:i:19
A01382:376:H5HVGDRX3:2:2270:19180:5181	147	chr19	55910586	60	68M5D92M	=	55910535	-216	AAGGCCTCCGCACCCTCCAGATATCTCTCCATATTACCCGCTGTCGCCCGGCACCGTAGGACAAATCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTCTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	FF,FFFFF:FFFFFFFFFFFF,FF:FFFFFFF:F,FFFFFFFFFFFFFFF:FFFFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFF:FFFFFFFFFFFFFFFF:FF,F:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	MC:Z:109M	MD:Z:68^CCCAT56T35	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:6	MQ:i:60	AS:i:144	XS:i:19
A01382:376:H5HVGDRX3:1:2145:13051:6527	147	chr19	55910620	60	34S34M5D92M	=	55910535	-216	CCCGCCCCCCCACCCACCAGATCTCTCGCCACACTACCCGCCGTCGCCCGGCACCGTAGGACAACTCCCCGCTAGGATGGTTAGTACCACAGTAAGCAATTCCAGTTTTTAATTCCCTTTTTGTTTCTTGCATGAGCATGCTTATTAGTTTACGTGTACG	:,,,FF,FF,F,FFF,FFF,:,,,F,F,FF,,,,,FFFFFF,F:F:FFFFFF:FFF:FF:FF:F,FFFFF,FFFF,:FF:F:FF,FFFF:FF,F:F:FF:FFFFFFFF:F,::FFF::FF:FFF,FFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFF	MC:Z:109M	MD:Z:7T22A3^CCCAT92	RG:Z:CGAAGAGGTAGGTGCGAG-1	NM:i:7	MQ:i:60	AS:i:105	XS:i:20
EOF

bind 'set disable-completion off'

samtools view reads.sam -b > reads.bam
samtools index reads.bam

whatshap --debug polyphase --ploidy 2 --reference=chr19.fa unphased_freebayes.vcf reads.bam

output

This is WhatsHap (polyploid) 2.1 running under Python 3.12.0
DEBUG: Read groups in CRAM/BAM header: [{'ID': 'CGAAGAGGTAGGTGCGAG-1', 'SM': 'CGAAGAGGTAGGTGCGAG-1'}]
DEBUG: Reading the input VCF to find possibly missing headers
DEBUG: Missing contigs: []
DEBUG: Missing formats: ['AD']
DEBUG: Missing infos: []
DEBUG: Found 1 sample(s) in the VCF file.
DEBUG: Parsed 0 SNVs and 1 non-SNVs. Also found 1 multi-ALTs.
======== Working on chromosome 'chr19'
---- Processing individual CGAAGAGGTAGGTGCGAG-1
Number of variants skipped due to missing genotypes: 0
Number of remaining heterozygous variants: 1
DEBUG: Reading alignments for sample 'CGAAGAGGTAGGTGCGAG-1' on chromosome chr19 and detecting alleles ...
Found 7 reads covering 1 variants
Kept 0 reads that cover at least two variants each
Segmentation fault

I am using 07129fd . See also #441, which seems to be the same issue but has been resolved?

marcelm added a commit that referenced this issue Nov 3, 2023
@marcelm
Copy link
Contributor

marcelm commented Nov 3, 2023

The fix in #441 was apparently never merged. I’ve opened #497.

@schrins
Copy link
Collaborator

schrins commented Nov 3, 2023

Oh, that was my fault. I will likely not have time to fix it today, but at some time over the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants