-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all the VG commands using VGSet know how to read a file of filenames as input #234
Comments
Hey Ali - that is a lot of files. Have you tried a subset of them as a smoke test?
Peeking at the source code, it looks like the
I'm not sure this will work, but I suspect it will be a step in the right direction. Hopefully @ekg or @adamnovak can chime in when they get some free time at their conference. Magic sed line: http://unix.stackexchange.com/questions/114943/can-sed-replace-new-line-characters |
Just concatenate those graphs together and try it again. You can loop over |
The .vg notation will not work around your argument list length problem, by the way. The * is expanded by the shell, so vg gets the list of all the matching files. If that list is too long, I don't know what exactly will happen, but it won't work correctly. It might just cancel the expansion and pass along the literal ".vg". If that happens, or if you otherwise you get the shell not to expand it (like by using quotes), vg will see "*.vg" literally, which is not a vg file that it can open, and so it won't work. |
Thanks @ekg, @edawson, and @adamnovak. It's good to know that vg files can be merged by simply concatenating them using
Yes, I tried and it worked for fewer number of vg files without problem. I'm not sure about the details, but It seems that the wildcard expansion is done successfully. The E2BIG error (whose result is error message "Argument list too long" and it's defined in
But when I use wildcard, it works fine:
But for indexing, I did the same trick (using wildcard) as a workaround for this issue:
I don't get E2BIG error message, but vg fails to index. That's why I think there's some internal problems in this regard: maybe some external commands whose length exceed ARG_MAX limit are executed internally. |
Your assessment is right. vg index is running a concatenate command to put I think the only solution for large numbers of files is to concatenate them This is all pretty annoying and should be streamlined. We could implement
|
As a rough idea, |
It might be easier to teach vg ids to read a file list. Then you can do the
|
@cartoonist Have you managed to resolve this (even in a hacky way as described here)? The issue is open because this shouldn't need to be scripted out. |
@cartoonist have you tried building the graph off of the reference FASTA made of the 17200 contigs? It seems like it might just work if it's small. The tutorial focused on the problem of building the graph for a very large genome. |
Hi @ekg, I was on vacation. Sorry for late reply. I will check and inform you about the way I could manage to create the graph in few days. |
I've been testing with the current HEAD and things are going pretty well. On Wed, Mar 30, 2016 at 12:26 PM Ali Ghaffaari notifications@github.com
|
Is this still a problem? And is the fact that people may want to operate on more graphs than they can fit on a command line still in scope for vg? Do we want to change the issue to something like "Make all the VG commands using VGSet know how to read a file of filenames as input"? |
Makes sense to me. On Wed, Oct 19, 2016, 18:23 Adam Novak notifications@github.com wrote:
|
I'm trying to construct and index a whole genome variation graph of a relatively small genome containing ~17200 short regions. I constructed variation graphs for each region separately. I also generated a joint id space across each graph by using
vg ids
. When I try to create xg index, I got this error message:In addition, when I try to explicitly indicate the file name of variation graphs, it reaches the ARG_MAX limit and this error message appears:
The text was updated successfully, but these errors were encountered: