Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vg grep the none-reference sequence from the Minigrah-Cactus result? #4193

Closed
ld9866 opened this issue Dec 15, 2023 · 3 comments
Closed

vg grep the none-reference sequence from the Minigrah-Cactus result? #4193

ld9866 opened this issue Dec 15, 2023 · 3 comments

Comments

@ld9866
Copy link

ld9866 commented Dec 15, 2023

Dear developer:
We currently use Minigrah-Cactus to build the pan-genome and convert it into vg format files. I would like to ask how we can extract the non-reference sequence in the pan-genome into fasta format because I used vg before, but the code was lost due to our negligence. Can you help me?

@jeizenga
Copy link
Contributor

You can extract paths as a FASTA using vg paths --extract-fasta. I think the interface requires a GBWT as input, so you may need to pull out the GBWT from your GBZ using vg gbwt.

@ld9866
Copy link
Author

ld9866 commented Dec 26, 2023

Dear developer:
We encountered some problems in vg index, showing insufficient memory, but our running memory is 1TB, I would like to ask whether our code and thinking are correct, and how should we solve this problem?
vg mod -X 256 test.full.vg > test.full.mod.vg
vg index -x test.full.xg -g test.full.gcsa -k 16 -t 8 test.full.mod.vg
error:
InputGraph::InputGraph(): Memory use of input kmers (1149.82 GB) exceeds memory limit (1024 GB)
vg gbwt -g test.full.gbwt -t 8 -x test.full.xg test.full.vg

@jeizenga
Copy link
Contributor

jeizenga commented Jan 3, 2024

Is there a reason you are doing a manual indexing pipeline instead of using vg autoindex? For most users, vg autoindex is more robust to issues like this.

It's also unclear to me which mapping tool you're planning to use. The GCSA2 index is used by vg map, but the GBWT usually is not. The GBWT is typically used in vg giraffe. This is another reason to use vg autoindex: it can determine exactly which indexes you need based on the mapping tool you want to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants