Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vartrix problem #126

Open
sanchezy opened this issue Dec 6, 2021 · 26 comments
Open

vartrix problem #126

sanchezy opened this issue Dec 6, 2021 · 26 comments

Comments

@sanchezy
Copy link

sanchezy commented Dec 6, 2021

Hi @wheaton5

I ran the souporcell_latest.sif pipeline (using singularity) successfully for 14 of my 16 libraries. In two of them I got an error and tracked back to vartrix (in the vartrix.err). The error is this:

Traceback (most recent call last): File "/opt/souporcell/souporcell_pipeline.py", line 589, in <module> vartrix(args, final_vcf, bam) File "/opt/souporcell/souporcell_pipeline.py", line 512, in vartrix subprocess.check_call(cmd, stdout = out, stderr = err) File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['vartrix', '--mapq', '30', '-b', '/home/yaraScratch/souporcell-F1678CM-AB2-Sc-4-1/souporcell_minimap_tagged_sorted.bam', '-c', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/barcodes.tsv', '--scoring-method', 'coverage', '--threads', '8', '--ref-matrix', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/ref.mtx', '--out-matrix', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/alt.mtx', '-v', '/home/yara/Scratch/souporcell-F1678CM-AB2-Sc-4-1/souporcell_merged_sorted_vcf.vcf.gz', '--fasta', '/home/yara/Scratch/references/refdata-cellranger-GRCh38-3.0.0/fasta/genome.fa', '--umi']' returned non-zero exit status 101.

I emailed the crash reports to the authors and they replied that I should try with a newest version of vartrix (https://github.com/10XGenomics/vartrix/releases/tag/v1.1.22).
So, my questions are: is there a way around this? how could I do this? would it be possible for you to add this to the souporcell_latest.sif?

Many thanks for your help!

@wheaton5
Copy link
Owner

wheaton5 commented Dec 6, 2021

I'll do this as soon as I can. I need a computer with admin access (work computer doesn't have that) and I'm trying to find the charger to my 2011 macbook air lol. I just moved and it wasnt in the same box as the computer... You could run vartrix manually, add the files to that folder as well as a vartrix.done file and then restart the pipeline. It will see the vartrix.done file and go from the next step. Just use the same arguments as in the error message above.

@changostraw
Copy link

I also keep getting a vatrix crash. However, I cannot find the crash report.

^[[0m^[[0m^[[31mWell, this is embarrassing.

vartrix had a problem and crashed. To help us diagnose the problem you can send us a crash report.

"We have generated a report file at "/tmp/report-cb2042fa-804e-491a-bc56-91f750318372.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!
^[[0m"

There is no tmp/ directory in the working directory so I am not sure where the report was saved. thanks!

@wheaton5
Copy link
Owner

Is there a vartrix.err file?

@changostraw
Copy link

Yes that is all vartrix.err files contains.

"^[[0m^[[0m^[[31mWell, this is embarrassing.

vartrix had a problem and crashed. To help us diagnose the problem you can send us a crash report.

"We have generated a report file at "/tmp/report-cb2042fa-804e-491a-bc56-91f750318372.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

Authors: Ian Fiddes ian.fiddes@10xgenomics.com, Patrick Marks patrick@10xgenomics.com
We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!
^[[0m"

I cannot locate the report anywhere - at least in directories I have permissions for

@wheaton5
Copy link
Owner

wheaton5 commented Jun 2, 2022

It might be something upstream to vartrix and we are giving vartrix bad input. What does the vcf look like? Can you try running vartrix manually?

@changostraw
Copy link

I think it was my vcf file. It had been aligned to h19 by the sequencing centre and not h38 like my bam file. It is running fine now that I lifted it to h38. Although now I am having a problem with the clustering, but I will open another issue for that. thanks!

@LorenzoMerotto
Copy link

I have the same problem here.
I have 4 libraries, analyzed with the same CellRanger version and the same reference genome. The analysis is completed for 3 out of 4 samples, while one of them crashed with the same error message

@wheaton5
Copy link
Owner

Can u provide the contents of any of the .err files?

@LorenzoMerotto
Copy link

LorenzoMerotto commented Sep 30, 2022

  • This is the error message in the vartrix.err file
vartrix had a problem and crashed. To help us diagnose the problem you can send us a crash report.

We have generated a report file at "/tmp/1555006.1.bigmem.q/report-be2ad8f9-943e-415c-87b3-02b37277f039.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

- Authors: Ian Fiddes <ian.t.fiddes@gmail.com>, Patrick Marks <patrick@10xgenomics.com>

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!

However the temporary directory does not seem to exist

  • This is the content of the retag.err file
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
[bam_sort_core] merging from 2 files and 1 in-memory blocks...
[bam_sort_core] merging from 3 files and 1 in-memory blocks...
[bam_sort_core] merging from 3 files and 1 in-memory blocks...
  • This is the content of the bcftools.err
Writing to /tmp/bcftools-sort.d0HlEo
Merging 1 temporary files
Cleaning
Done

@drneavin
Copy link

@LorenzoMerotto and @wheaton5, did you work this out? We are running in to a similar issue where most pools have executed correctly but a couple haven't. They have been processed the same way upstream to this so the reason for failure is not clear. Any input you have would be fantastic!

Thanks for your help!

@wheaton5
Copy link
Owner

I think I need more information. Usually when vartrix fails, its due to a previous error. Probably freebayes failing. Can you check all .err files and also whether the vcf output from freebayes is empty?

@drneavin
Copy link

Thanks for the fast response @wheaton5 !

I can't see anything in particular that jumps out as a problem with any of the preceding steps but I've put details below so hopefully you see something that we've missed.

Here's a summary of the files generated in the failed pool:

-rw-r--r-- 1        27794 Oct 27 13:47 fastqs.done
-rw-r--r-- 1         4698 Oct 27 17:25 minimap.err
-rw-r--r-- 1         2182 Oct 27 17:25 remapping.done
-rw-r--r-- 1         1023 Oct 27 17:51 retag.err
-rw-r--r-- 1      43123150821 Oct 27 20:52 souporcell_minimap_tagged_sorted.bam
-rw-r--r-- 1      6012584 Oct 27 21:08 souporcell_minimap_tagged_sorted.bam.bai
-rw-r--r-- 1           0 Oct 27 21:09 retagging.done
-rw-r--r-- 1     62832057 Oct 27 21:30 depth_merged.bed
-rw-r--r-- 1    367251435 Oct 27 21:31 common_variants_covered_tmp.vcf
-rw-r--r-- 1    367258040 Oct 27 21:31 common_variants_covered.vcf
-rw-r--r-- 1          135 Oct 27 21:31 variants.done
-rw-r--r-- 1            0 Oct 27 21:31 vartrix.out
-rw-r--r-- 1          605 Oct 27 22:37 vartrix.err

Here are the contest of each of the error files.

  • minimap.err:
[M::mm_idx_gen::60.303*1.79] collected minimizers
[M::mm_idx_gen::68.434*2.98] sorted minimizers
[M::main::68.434*2.98] loaded/built the index for 194 target sequence(s)
[M::mm_mapopt_update::68.434*2.98] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 194
[M::mm_idx_stat::77.292*2.75] distinct minimizers: 381286575 (95.43% are singletons); average occurrences: 1.291; average spacing: 6.295
[M::worker_pipeline::84.520*3.71] mapped 263158 sequences
[M::worker_pipeline::90.321*4.51] mapped 263158 sequences
[M::worker_pipeline::94.441*5.02] mapped 263158 sequences
[M::worker_pipeline::100.680*5.71] mapped 263158 sequences
[M::worker_pipeline::108.147*6.41] mapped 263158 sequences
[M::worker_pipeline::114.296*6.94] mapped 263158 sequences
[M::worker_pipeline::121.092*7.45] mapped 263158 sequences
[M::worker_pipeline::128.174*7.93] mapped 263158 sequences
[M::worker_pipeline::136.020*8.41] mapped 263158 sequences
[M::worker_pipeline::141.022*8.69] mapped 263158 sequences
[M::worker_pipeline::148.326*9.05] mapped 263158 sequences
[M::worker_pipeline::154.288*9.32] mapped 263158 sequences
[M::worker_pipeline::161.255*9.62] mapped 263158 sequences
[M::worker_pipeline::169.079*9.92] mapped 263158 sequences
[M::worker_pipeline::173.876*10.10] mapped 263158 sequences
[M::worker_pipeline::177.756*10.23] mapped 263158 sequences
[M::worker_pipeline::189.109*10.57] mapped 263158 sequences
[M::worker_pipeline::192.414*10.67] mapped 263158 sequences
[M::worker_pipeline::196.717*10.79] mapped 263158 sequences
[M::worker_pipeline::200.775*10.90] mapped 263158 sequences
[M::worker_pipeline::208.185*11.08] mapped 263158 sequences
[M::worker_pipeline::215.253*11.25] mapped 263158 sequences
[M::worker_pipeline::220.922*11.38] mapped 263158 sequences
[M::worker_pipeline::227.564*11.52] mapped 263158 sequences
[M::worker_pipeline::232.541*11.62] mapped 263158 sequences
[M::worker_pipeline::238.969*11.74] mapped 263158 sequences
[M::worker_pipeline::246.172*11.87] mapped 263158 sequences
[M::worker_pipeline::251.254*11.96] mapped 263158 sequences
[M::worker_pipeline::255.360*12.03] mapped 263158 sequences
[M::worker_pipeline::261.302*12.12] mapped 263158 sequences
[M::worker_pipeline::265.206*12.18] mapped 263158 sequences
[M::worker_pipeline::271.390*12.27] mapped 263158 sequences
[M::worker_pipeline::278.991*12.38] mapped 263158 sequences
[M::worker_pipeline::284.412*12.45] mapped 263158 sequences
[M::worker_pipeline::290.930*12.54] mapped 263158 sequences
[M::worker_pipeline::296.213*12.60] mapped 263158 sequences
[M::worker_pipeline::303.196*12.68] mapped 263158 sequences
[M::worker_pipeline::308.148*12.74] mapped 263158 sequences
[M::worker_pipeline::314.887*12.81] mapped 263158 sequences
[M::worker_pipeline::321.757*12.88] mapped 263158 sequences
[M::worker_pipeline::326.717*12.93] mapped 263158 sequences
[M::worker_pipeline::331.553*12.98] mapped 263158 sequences
[M::worker_pipeline::337.556*13.04] mapped 263158 sequences
[M::worker_pipeline::344.316*13.10] mapped 263158 sequences
[M::worker_pipeline::349.462*13.15] mapped 263158 sequences
[M::worker_pipeline::354.906*13.19] mapped 263158 sequences
[M::worker_pipeline::368.196*13.30] mapped 263158 sequences
[M::worker_pipeline::373.878*13.34] mapped 263158 sequences
[M::worker_pipeline::377.816*13.37] mapped 263158 sequences
[M::worker_pipeline::384.011*13.42] mapped 263158 sequences
[M::worker_pipeline::390.742*13.47] mapped 263158 sequences
[M::worker_pipeline::395.201*13.50] mapped 263158 sequences
[M::worker_pipeline::400.524*13.54] mapped 263158 sequences
[M::worker_pipeline::406.876*13.58] mapped 263158 sequences
[M::worker_pipeline::411.816*13.62] mapped 263158 sequences
[M::worker_pipeline::415.891*13.64] mapped 263158 sequences
[M::worker_pipeline::424.468*13.69] mapped 263158 sequences
[M::worker_pipeline::428.557*13.68] mapped 110770 sequences
[M::main] Version: 2.7-r654
[M::main] CMD: minimap2 -ax splice -t 16 -G50k -k 21 -w 11 --sr -A2 -B8 -O12,32 -E2,1 -r200 -p.5 -N20 -f1000,5000 -n2 -m20 -s40 -g2000 -2K50m --secondary=no genome.fa tmp.fq
[M::main] Real time: 429.353 sec; CPU: 5863.578 sec
mapping
minimap2 -ax splice -t 16 -G50k -k 21 -w 11 --sr -A2 -B8 -O12,32 -E2,1 -r200 -p.5 -N20 -f1000,5000 -n2 -m20 -s40 -g2000 -2K50m --secondary=no genome.fa tmp.fq
  • retag.err:
[bam_sort_core] merging from 9 files and 1 in-memory blocks...
[bam_sort_core] merging from 16 files and 1 in-memory blocks...
[bam_sort_core] merging from 17 files and 1 in-memory blocks...
[bam_sort_core] merging from 20 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 21 files and 1 in-memory blocks...
[bam_sort_core] merging from 23 files and 1 in-memory blocks...
[bam_sort_core] merging from 23 files and 1 in-memory blocks...
[bam_sort_core] merging from 24 files and 1 in-memory blocks...
[bam_sort_core] merging from 33 files and 1 in-memory blocks...
  • vartrix.err:
Well, this is embarrassing.

vartrix had a problem and crashed. To help us diagnose the problem you can send us a crash report.

We have generated a report file at "/tmp/report-4980e9d6-bfc5-407e-9cec-3ba62c19145b.toml". Submit an issue or email with the subject of "vartrix Crash Report" and include the report as an attachment.

- Authors: Ian Fiddes <ian.fiddes@10xgenomics.com>, Patrick Marks <patrick@10xgenomics.com>

We take privacy seriously, and do not perform any automated error collection. In order to improve the software, we rely on people to submit reports.

Thank you kindly!

The freebayes vcf looks normal to me and has 1,181,356 variants. Here's the top and bottom of the file:

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##filedate=2022.8.29
##source=Minimac4.v1.0.2
##INFO=<ID=AF,Number=1,Type=Float,Description="Estimated Alternate Allele Frequency">
##INFO=<ID=MAF,Number=1,Type=Float,Description="Estimated Minor Allele Frequency">
##INFO=<ID=R2,Number=1,Type=Float,Description="Estimated Imputation Accuracy (R-square)">
##INFO=<ID=ER2,Number=1,Type=Float,Description="Empirical (Leave-One-Out) R-square (available only for genotyped variants)">
##INFO=<ID=IMPUTED,Number=0,Type=Flag,Description="Marker was imputed but NOT genotyped">
##INFO=<ID=TYPED,Number=0,Type=Flag,Description="Marker was genotyped AND imputed">
##INFO=<ID=TYPED_ONLY,Number=0,Type=Flag,Description="Marker was genotyped but NOT imputed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DS,Number=1,Type=Float,Description="Estimated Alternate Allele Dosage : [P(0/1)+2*P(1/1)]">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Estimated Posterior Probabilities for Genotypes 0/0, 0/1 and 1/1">
##contig=<ID=chr1>
##contig=<ID=chr10>
##contig=<ID=chr11>
##contig=<ID=chr12>
##contig=<ID=chr13>
##contig=<ID=chr14>
##contig=<ID=chr15>
##contig=<ID=chr16>
##contig=<ID=chr17>
##contig=<ID=chr18>
##contig=<ID=chr19>
##contig=<ID=chr2>
##contig=<ID=chr20>
##contig=<ID=chr21>
##contig=<ID=chr22>
##contig=<ID=chr3>
##contig=<ID=chr4>
##contig=<ID=chr5>
##contig=<ID=chr6>
##contig=<ID=chr7>
##contig=<ID=chr8>
##contig=<ID=chr9>
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  MP11    MP12    MP13    MP14    MP15    MP16    MP17    MP18    MP19    MP20    MP21
chr1    788439  1:788439:T:A    T       A       .       PASS    AF=0.07287;MAF=0.07287;R2=0.47297;IMPUTED;AC=2;AN=22    GT:DS:GP        0|0:0.038:0.962,0.037,0 0|0:0.182:0.823,0.171,0.005 1|0:0.893:0.107,0.893,0 0|0:0.059:0.942,0.057,0.001     0|0:0.072:0.93,0.069,0.001      0|0:0.173:0.834,0.158,0.007     0|1:0.68:0.383,0.553,0.064      0|0:0.055:0.945,0.054,0.001 0|0:0.058:0.943,0.057,0.001     0|0:0.059:0.942,0.057,0.001     0|0:0.058:0.943,0.056,0.001
chr1    791101  1:791101:T:G    T       G       .       PASS    AF=0.83234;MAF=0.16766;R2=0.41176;IMPUTED;AC=19;AN=22   GT:DS:GP        1|1:1.676:0.026,0.272,0.702     1|1:1.483:0.035,0.448,0.517 1|0:0.964:0.036,0.964,0 1|1:1.841:0.006,0.146,0.848     1|1:1.83:0.007,0.156,0.837      1|0:1.227:0.098,0.578,0.325     1|0:1.139:0.126,0.609,0.2651|1:1.852:0.005,0.137,0.857      1|1:1.843:0.006,0.145,0.849     1|1:1.85:0.006,0.139,0.855      1|1:1.844:0.006,0.144,0.85

...

chr9    138122079       9:138122079:C:T C       T       .       PASS    AF=0.80279;MAF=0.19721;R2=0.86081;IMPUTED;AC=13;AN=22     GT:DS:GP        0|1:0.978:0.022,0.978,0 1|1:1.92:0,0.079,0.921  0|1:1.021:0.001,0.977,0.022     0|1:0.979:0.022,0.977,0.001       0|1:0.969:0.031,0.969,0 0|1:1.018:0.002,0.979,0.019     0|1:0.986:0.029,0.956,0.015    0|1:0.975:0.025,0.975,0   1|1:1.862:0.004,0.129,0.866     0|1:0.994:0.011,0.985,0.005     0|1:0.988:0.012,0.988,0
chr9    138123517       9:138123517:C:T C       T       .       PASS    AF=0.36297;MAF=0.36297;R2=0.85195;IMPUTED;AC=7;AN=22      GT:DS:GP        0|1:0.985:0.015,0.985,0 0|0:0.25:0.751,0.248,0.001      0|1:0.983:0.017,0.982,0 0|0:0.005:0.995,0.005,0   0|1:0.98:0.02,0.98,0    0|1:0.997:0.01,0.982,0.007      0|1:0.976:0.024,0.976,0 0|0:0.003:0.997,0.003,0 0|1:1.155:0.044,0.758,0.198       0|1:0.981:0.019,0.98,0  0|0:0.003:0.997,0.003,0

Let me know if you see something that we're missing or if there are additional details we can provide to help identify the issue.

@wheaton5
Copy link
Owner

What is the deal with the multisample vcf? Freebayes is run in a mode which is unknown mixed samples and outputs a single sample vcf i thought

@wheaton5
Copy link
Owner

Are u using known_genotypes? Can you post your command line arguments?

@drneavin
Copy link

We are not using known_genotypes but we are using common_variants and the vcf we're using is a vcf that has the variants for the individuals in the pool. This is typically how we run souporcell so I don't think that is likely to be causing the error. Here's the command being run:

souporcell_pipeline.py \
-i $BAM \
-b $BARCODES \
-f $FASTA \
-t $THREADS \
-o $SOUPORCELL_OUTDIR \
-k $N \
--common_variants $VCF

@wheaton5
Copy link
Owner

You could try running vartrix manually with the latest vartrix? I made a new singularity build recently to include hisat2 which gives better alignments for variant calling and i could update vartix as well if that fixes things.

@drneavin
Copy link

We're trying that now - will let you know how it goes.

I hadn't seen that you had made a new singularity build. We'll take a look and see if the updated version helps sort things out

@wheaton5
Copy link
Owner

Its not up yet. Im testing it now.

@LorenzoMerotto
Copy link

@drneavin I solved it by running the analysis through conda. I created a new env and installed the required dependencies

@drneavin
Copy link

Great, thanks both! I can confirm that the issue was resolved with the newest version of vartrix. @wheaton5, might be good to update it in the new image as well.

@wheaton5
Copy link
Owner

thanks, i will update it in the new singularity build

@changostraw
Copy link

I am also having this issue. I also posted in the demuxafy board as I am using the Demuxafy singularity image to run souporcell. Have the images been updated since this discussion? Or should I also run vartrix separtely ? Thanks!

@Angel-Wei
Copy link

Hi @drneavin , may I ask how the assignment of clusters to individuals in the pool is usually done in your case following this command? I'm a bit confused by the VCF files given to known_genotypes and common_variants. On some of my pooled samples, my initial attempts including known_genotypes and known_genotypes_sample_names couldn't complete and stalled at the clustering. I'd like to give it a try using the command option you recommended if it works well. Thank you so much!

We are not using known_genotypes but we are using common_variants and the vcf we're using is a vcf that has the variants for the individuals in the pool. This is typically how we run souporcell so I don't think that is likely to be causing the error. Here's the command being run:

souporcell_pipeline.py \
-i $BAM \
-b $BARCODES \
-f $FASTA \
-t $THREADS \
-o $SOUPORCELL_OUTDIR \
-k $N \
--common_variants $VCF

@drneavin
Copy link

Hi @Angel-Wei, I have a put together some wrappers for demultiplexing and doublet detecting methods in Demuxafy. The script I think you're looking for will correlate the genotypes for the vcf output by souporcell compared to the your vcf after running souporcell which you can find here. Or if you just want to run the script without downloading the Demuxafy singularity image, you can find that script here.

If you have any followup questions about Demuxafy or this script, it would probably be best to open an issue here.

@Angel-Wei
Copy link

Hi @drneavin ! Thank you so much for the quick response! Yes, I was also looking at Demuxafy as well and the documentation was really clear to follow. I guess it was my misunderstanding that I thought there was another pipeline I wasn't aware other than Demuxafy. I can surely proceed with that. Thank you so much!

@Angel-Wei
Copy link

Hi @drneavin ! Sorry to bug you again. But if you don't mind, can I ask one more question? I wonder is there supposed to be any difference between using common_variants or not using when running the pipeline in a genotype-free manner (like not using known_genotypes and known_genotypes_sample_names)? My attempt on this recommended command hasn't been completed, but I assume including common_variants will output a common_variants_covered.vcf file compared to not including it? Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants