New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vg sim error: [insert_gbwt_path()] path name already exists: #4209
Comments
I think I've (kind of) figured out what's going on here, or at least fixed my specific problem. When I created the .xg graph, I did not include the -H option in I think including the haplotype information in the graph doesn't play well with the .gbwt index while simulating reads from a specific sample in some way, though I'm still not sure how this resulted in the same simulated read file being produced from commands specifying different samples. When I run vg sim with the haplotype-dropped .xg graph and specify a non-reference sample, paths are inserted without errors. When I specify the reference sample, I receive the aforementioned errors, but I think this probably makes sense in this case because the sample is the reference used for the graph? I only have a fuzzy idea of how the algorithm is working and the structure of these files, so if anyone has a clearer explanation, please let me know! |
Sorry to be so delayed responding to this issue. It's definitely understandable that aspects of the This particular confusion is because The hack that we use to also simulate from haplotype paths is to add the haplotype paths as embedded paths and then simulate from them with the standard algorithm (with the graph in XG format). Because the names of the embedded paths are expected to be unique, you have to ensure that you only add them to the XG graph once: either when converting from the GBZ or when starting One thing that I notice from looking at your commands is that you are simulating from the |
Thank you for the incredibly helpful response! This makes sense to me now, and gives me some extra confidence moving forward with my analysis. I'll keep your .d2 graph warning in mind as well! |
Hello, I made a post asking about this on biostars when I thought the error was more innocuous than it actually is.
1. What were you trying to do?
Produce simulated reads from a sample within a graph.
2. What did you want to happen?
Produce a .gam file with simulated reads only from the sample specified.
3. What actually happened?
I received these errors, which differed slightly if the sample specified was the reference sample of the graph:
or not a reference sample (the error repeats many additional times per chromosome):
Despite the errors, a .gam file of the simulated reads is produced. However, I noticed that reads simulated from the same graph and seed but from different samples produced identically sized .gam files, and that the reads produced are identical but in a different order. It seems vg sim isn't constraining it's simulation to just one sample in the graph due to this error.
I created my graphs using minigraph-cactus to produce a .gbz, then created a .gbwt from the .gbz using
vg gbwt
. The error occurs no matter what sample I specify or if I use a .xg file produced from the .gbz file instead.5. What data and command can the vg dev team use to make the problem happen?
EDIT: I was able to recreate this error using minigraph-cactus' yeast pangenome dataset from their tutorial. The required data is here.
I used the following sequence of commands to reproduce the error:
Both vg sim commands result in the same error.
6. What does running
vg version
say?I'm relatively new to using graphs and vg, so I'm hoping this is a simple mistake on my part at some point in the pipeline.
Thank you for your help!
The text was updated successfully, but these errors were encountered: