New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parallel ctags #761
Comments
Hello, while built in parallel implementation can be interesting, its already possible to parallelize updating a big codebase by launching different ctags on different directory and then merging the generated files (which can be done simply by dropping the lines beginning with ! from all files but one and using sort --merge on all files afterwards). However, I am not convinced you'll get any speedup from parallelized ctags, as I expect modern machines to be I/O bound. That would need to be profiled to make sure though. |
@mawww I'm sure https://github.com/ggreer/the_silver_searcher would disagree Running multiple ctags would be extremely difficult to coordinate from the standard emacs https://github.com/bbatsov/projectile/blob/master/projectile.el#L180-L183 |
Good point.
A shell script wrapper could already go a long way, but yeah it might be more efficient to integrate that directly into ctags. |
@fommil That guy's blogpost on the matter is not very clear from where it started to where it went (well, you can read it between the lines, but well), and anyway it's really not that much. And I don't mean to disregard any of his work, but I'm not gonna fully trust the results of someone that seemingly just learned about multithreading (esp. because of i.e. how much a misused mutex destroys any performance that MT can give). Not saying he's not totally right, but I'll need to be convinced :) Also, another reason why it doesn't appeal me so much is that not only I don't believe it'll give us so much, but it'll be a large amount of error-prone work. Currently the CTags code base is in absolutely no shape to support parallel tags parsing threads. All you might be able to split relatively easily is the init/directory traversal and one single parser thread. So sure, multithreading probably can have some benefits if used very well, but it's likely not to be the most interesting improvement. |
BTW, I don't mean that improving this area in the code is not a good idea, I do think it is (esp. for a possible future libctags). I just mean that if performance is the goal, it's probably not (currently) worth the effort, and that there are more important area to focus on. BTW, firing up a profiler and profiling a buckload of data in a gazillion ways would probably be interesting. |
GNU parallel may help you. |
As mentioned before, optimizing reading can speed up ctags quite a bit. |
Parallel execution of parsers could speed up things quite a bit if I/O comes from cache (and this is often the case the Nth time you run ctags on a directory from an editor). |
@pragmaware IMO, a library should not fork. |
If you read Japanese text, look at the article, https://qiita.com/dalance/items/c76141a097e25fabefe8 . It reports a tool named ptags the author developed. The tool is written in Rust and wraps ctags. The result is quite impressive. 5 times faster than single processing. The number of cpus is not written. The size of memory may be enough(=128GB). The author runs 10 times ptags for the same input set to make page cache hot. Though these things should be done in wrappers like ptags, it is difficult to ignore this great result. The number of worker processes, 8, is hard coded. My note PC has 8 cores. The result is the mostly the same: 2 ~ 3 times faster.
(I comapred both tags file. There is no diffference.) Far from satisfying, but it is good place to start. |
I wonder whether output of workers must be gathered or not. |
hi @masatake I'm trying to close all my open tickets that I don't plan to work on. If you're interested in working on this ticket, could you please copy the text into a new ticket? |
I will work on this item in the future. I would like to keep this item open because the record of the discussion here will be valuable for me. |
@masatake you can still link to this ticket from a new one and retain the full history. This would really help me out, as I am trying to clean up my "Issues" tab for a new job and I don't want clutter like this ticket getting in the way. |
@fommil, I don't see how you can override @masatake, who is the driving force behind Universal Ctags, with 2,700 commits versus your commit count of zero. Once you open a bug (or, in GitHub parlance, "issue"), this bugs becomes the property of the project. I believe you can unwatch it and not receive any emails about it. Reopening. |
@dtikhonov @masatake please close this ticket. It is the only ticket in my https://github.com/issues view that is not relevant to my work. It is not possible to remove a ticket from this view unless the ticket is closed. Even if I unsubscribe. Indeed, I was not aware that repo owners would have this control when I created the ticket, otherwise I would not have done so. if you want to work on this, please create a new ticket and reference this ticket, all discussion is preserved. Or just copy paste the contents of #761 (comment) into a fresh ticket. I do not think this is much for me to ask. |
Could you make a temporal GitHub account just for copy-paste? |
sure, if that's the only way to fix this I can do that. |
done! Thanks for letting me close this ticket. It cleans up my TODO task significantly. |
as far as I understand it, ctags is single threaded. Are there any plans to support parallelisation? May speed things up on huge codebases.
The text was updated successfully, but these errors were encountered: