Feature request: Accept jellyfish kmer counts for vg haplotypes #4215

JosephLalli · 2024-01-25T17:02:28Z

For more background see eblerjana/pangenie#62.

Long story short, I'm trying to replicate and make use of the personalized pangenome pipeline described in your recent paper (https://www.biorxiv.org/content/10.1101/2023.12.13.571553v2.full).

When using Pangenie to genotype the graph created by vg haplotypes from a human 30X Illumina fastq dataset, a representative run in my hands spends 1484s out of a total runtime of 1910s counting fastq kmer reads. Pangenie is able to accept pre-counted kmer files, but only if they are in Jellyfish2's format. Internally, Pangenie uses the jellyfish api for kmer management.

It seems that using kff files is difficult for Pangenie, since they do not appear to allow for random access. So, maybe we could use jellyfish to count kmers, and provide those counts to vg haplotypes? That would avoid having two different algorithms count the same kmers twice.

Best,
Joe

jltsiren · 2024-01-25T18:48:50Z

We chose KFF because we wanted to avoid adding yet another major dependency. VG already has too many of them, making the build system fragile.

As for random access, we also need it in vg haplotypes. We simply load the kmer counts into a hash map. On my laptop, that takes ~100 seconds for the counts from 30x reads: 25 seconds for prepopulating the hash map with the kmers we are interested in and 75 seconds for multithreaded reading.

JosephLalli · 2024-01-25T18:52:37Z

Understood. I agree about the dependencies!

I'll copy your comment on the similar issue I created at Pangenie (eblerjana/pangenie#62). Maybe you and Jana can help each other get behind one kmer ecosystem for pangenome analysis.

Best,
Joe

JosephLalli mentioned this issue Jan 25, 2024

Feature request: support kff input files eblerjana/pangenie#62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Accept jellyfish kmer counts for vg haplotypes #4215

Feature request: Accept jellyfish kmer counts for vg haplotypes #4215

JosephLalli commented Jan 25, 2024 •

edited

Loading

jltsiren commented Jan 25, 2024

JosephLalli commented Jan 25, 2024

Feature request: Accept jellyfish kmer counts for vg haplotypes #4215

Feature request: Accept jellyfish kmer counts for vg haplotypes #4215

Comments

JosephLalli commented Jan 25, 2024 • edited Loading

jltsiren commented Jan 25, 2024

JosephLalli commented Jan 25, 2024

JosephLalli commented Jan 25, 2024 •

edited

Loading