Skip to content

vg giraffe extremely slow on Illumina PE data unless rescue is disabled (-A none): clarification on trade-offs #4802

@Sumit1331

Description

@Sumit1331

Hello vg team,

Thank you for developing and maintaining vg and giraffe.

I observed that vg giraffe can be extremely slow on Illumina paired-end WGS data mapped to a minigraph-cactus pangenome graph on an HPC system. Some samples (fungal pathogen) took more than a day to finish with default settings, even after trying common solutions such as colocating inputs and outputs, copying index files (including .dist) to local scratch etc. In contrast, when I disabled rescue using -A none, the same samples completed in minutes and produced .gam files consistently.

After disabling rescue, the .gam file looks like:

Image

This suggests that rescue is the dominant performance bottleneck in my case. I would appreciate clarification on the trade-offs of disabling rescue, particularly regarding sensitivity loss and whether -A none is generally acceptable for high-coverage Illumina WGS and downstream SNP or population-genetic analyses.

Or do you have any other suggestions on how to speed up the process?

Thank you very much for your time and guidance.

Best regards,
Sumit

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions