New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too slow on large real world codebases #2668
Comments
I gave this a very quick test with my YCM including changes from #2657. It works much faster than before those changes, so could you give that PR a try? The PR is implementing async semantic completion. |
Hi @bstaletic. Did you give it a test on case 1, case 2 or program output of Program 2? |
I tried it on all three of those, but I couldn't get Program 1 to work as I don't have Windows installed at all. |
I didn't try #2657 yet, but I did check it before opening this issue. I'll try it but due to the scope of the pull, I was suspecting it wouldn't touch possibly existing issues in the request/display process as a whole. Notice that Program 1 was compiled and tested on Linux. I didn't use Windows for anything. Also notice that some flags in it may require changes for your machine, like header locations, etc. They're based on |
Actually, my bad. I have not used anything that contains As for |
@bstaletic my bad, I've updated the issue. I've posted the wrong case 1 snippet, it was to be based on #777 (comment) but I ended up removing the |
Still, I don't seealmost any slow down with the |
@bstaletic I've tried #2657. Overall completion is much improved, kudos to @micbou! Still it's behaving as I was expecting. What it does:
What it doesn't do:
|
@oblitum Your observations seem resonable, but, on my laptop, I just can't agree with "Completion arrives, but late". The initial parse and completions too about two seconds, but to make it that slow I had to use That said, I do believe you it can still feel slow for you (and others). But frankly, I have no idea where would I start looking to find the source of the slowness. As for being resource exensive, |
@bstaletic Here a video demonstrating it: https://vimeo.com/219545420. I've updated Program 2 to 60k identifiers (doubling the original) to make the effect clearly visible. On case 2, the bare inclusion of It's also worth noticing that #2657 introduces a new semantic completion behavior: after forced semantic completion, if I backspace to erase some chars of the current incomplete identifier, I'm forced to force semantic completion again to get the original semantic identifier ordering. I ignore the extension of the effects of this but some people may not like. I'm on a 7700K PC. Sorry about the second part of the video for the case 2, you'll be unable to know when I hit ctrl-space, I have no keyboard screencast software at the moment, I'll think of doing an update on that later. |
I knew there was something wrong when I read about your CPU. I'm on an i7 3610QM, so I definitely have less raw power than you. The thing that was wrong about my results was that I was looking at completions in the code that generates the large number of completions. Unlike you, I'm still hitting a timeout unless I type five character in the generated file. As for case 1, I can't actually say it's too fast, but considering how big the boost's |
Ah OK, you were testing the program, not the output. I hope others don't fall in the same trap :) |
I noticed one more thing.
This doesn''t happen with low number of identifiers. Also, I had to hit |
[READY] Add benchmark infrastructure This PR sets the infrastructure for adding benchmarks through [the Google benchmark library](https://github.com/google/benchmark) and for automatically running them on Travis and AppVeyor. They can also be run locally with the `benchmark.py` script. The library is included in the repository for compilation ease. Benchmarks are run on all platforms because optimizations may be platform-dependent. For now, there is only one benchmark based on the output of *Program 2* in ycm-core/YouCompleteMe#2668. It measures the filter and sort algorithm on a worst-case scenario: all identifiers share a common prefix and the query is part of the prefix. In that case, no identifiers are filtered and since they all have the same weight, the algorithm falls back to lexicographic sorting. This scenario is not uncommon in practice. For instance, C libraries often use a common prefix for naming variables and functions to simulate namespaces. Here's the output of the benchmark on my configuration: ``` ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------ CandidatesWithCommonPrefix_bench/1 1955 ns 1898 ns 345165 CandidatesWithCommonPrefix_bench/2 11563 ns 11681 ns 64102 CandidatesWithCommonPrefix_bench/4 30761 ns 30594 ns 22436 CandidatesWithCommonPrefix_bench/8 69551 ns 69532 ns 11218 CandidatesWithCommonPrefix_bench/16 143963 ns 143924 ns 4986 CandidatesWithCommonPrefix_bench/32 292668 ns 290603 ns 2362 CandidatesWithCommonPrefix_bench/64 862766 ns 869571 ns 897 CandidatesWithCommonPrefix_bench/128 2205099 ns 2191318 ns 299 CandidatesWithCommonPrefix_bench/256 8895499 ns 8840057 ns 90 CandidatesWithCommonPrefix_bench/512 17704787 ns 17680113 ns 45 CandidatesWithCommonPrefix_bench/1024 45564517 ns 45760293 ns 15 CandidatesWithCommonPrefix_bench/2048 96960893 ns 98057771 ns 7 CandidatesWithCommonPrefix_bench/4096 217881085 ns 218401400 ns 3 CandidatesWithCommonPrefix_bench/8192 481444392 ns 483603100 ns 2 CandidatesWithCommonPrefix_bench/16384 1005462405 ns 982806300 ns 1 CandidatesWithCommonPrefix_bench/32768 1805209871 ns 1809611600 ns 1 CandidatesWithCommonPrefix_bench/65536 4215533125 ns 4212027000 ns 1 CandidatesWithCommonPrefix_bench_BigO 3979.06 NlgN 3974.50 NlgN CandidatesWithCommonPrefix_bench_RMS 10 % 9 % ``` As you can see, performance becomes unacceptable starting from 16000 identifiers which is not a lot. A great feature of Google benchmark is that it can calculate the algorithm complexity. As expected, we have a `O(n log n)` complexity where `n` is the number of candidates (we are using `std::sort` to sort our candidates). Thanks to this benchmark, I was able to improve the performance on this particular case by a factor of 60. I'll send the changes once this PR is merged. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/769) <!-- Reviewable:end -->
PR ycm-core/ycmd#774 should improve responsiveness and reduce CPU usage in Case 1 and Case 2. |
Confirmed! It's much improved. |
@micbou IMO, it got so much more usable now, thanks. Your changes improved a lot on the semantic results, but I'm also with others applied, #2657 and this local one: diff --git a/cpp/ycm/IdentifierDatabase.cpp b/cpp/ycm/IdentifierDatabase.cpp
index 4a44bc7b..5e4e349a 100644
--- a/cpp/ycm/IdentifierDatabase.cpp
diff --git a/cpp/ycm/IdentifierDatabase.cpp b/cpp/ycm/IdentifierDatabase.cpp
index 4a44bc7b..5e4e349a 100644
--- a/cpp/ycm/IdentifierDatabase.cpp
+++ b/cpp/ycm/IdentifierDatabase.cpp
@@ -111,7 +111,10 @@ void IdentifierDatabase::ResultsForQueryAndType(
}
}
- std::sort( results.begin(), results.end() );
+ if (results.size() < 50)
+ std::sort( results.begin(), results.end() );
+ else
+ std::partial_sort( results.begin(), results.begin() + 50, results.end() );
}
diff --git a/cpp/ycm/PythonSupport.cpp b/cpp/ycm/PythonSupport.cpp
index 3f2d3f1d..0da9334d 100644
--- a/cpp/ycm/PythonSupport.cpp
+++ b/cpp/ycm/PythonSupport.cpp
@@ -101,11 +101,16 @@ boost::python::list FilterAndSortCandidates(
}
}
- std::sort( result_and_objects.begin(), result_and_objects.end() );
+ if (result_and_objects.size() < 50)
+ std::sort( result_and_objects.begin(), result_and_objects.end() );
+ else
+ std::partial_sort( result_and_objects.begin(),
+ result_and_objects.begin() + 50,
+ result_and_objects.end() );
}
- for ( const ResultAnd< int > &result_and_object : result_and_objects ) {
- filtered_candidates.append( candidates[ result_and_object.extra_object_ ] );
+ for ( size_t i = 0; i < result_and_objects.size() && i < 50; ++i ) {
+ filtered_candidates.append( candidates[ result_and_objects[i].extra_object_ ] ); |
@micbou I've noticed a large improvement on the case for Program 2 output due to the previous patch, before applying ycm-core/ycmd#774. |
…staletic [READY] Add FilterAndSortCandidates benchmarks [The `CandidatesForQuery` method](https://github.com/Valloric/ycmd/blob/2575bdb8c83dd5ace3d6f3d72a3425fdf2c18f5e/cpp/ycm/IdentifierCompleter.h#L69) is not the only place where performance is critical. There is also [the `FilterAndSortCandidates` function](https://github.com/Valloric/ycmd/blob/2575bdb8c83dd5ace3d6f3d72a3425fdf2c18f5e/cpp/ycm/PythonSupport.h#L29) which is used in the Python layer to filter and sort candidates returned by completers other than the identifier one (while `CandidatesForQuery` is specific to the identifier completer). We should add benchmarks for this function too. It would be great for these benchmarks to be based on real usage (e.g. case 1 or 2 from ycm-core/YouCompleteMe#2668). This could be done by storing the candidates in a file that would be read by the benchmarks to get the candidates. However, this kind of file would be too big (> 1MB) to be committed to the repository. We would have to download the file from somewhere else to run the benchmarks. I didn't want to bother so I went for the same candidates as the `CandidatesForQuery` benchmark. Those candidates represent a corner case but, as long as we don't specifically target them, they should be good enough. Two benchmarks are added: one when the candidates are not already stored in the repository and another when they are. This is important because both situations arise in practice: when filtering and sorting the candidates for the first time, they are added to the repository (user pressing `<C-Space>`) then they are retrieved from the repository next times (user typing characters to filter out candidates). Here are the benchmark results on my configuration: ``` Run on (4 X 3504 MHz CPU s) ------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------------ IdentifierCompleterFixture/CandidatesWithCommonPrefix/1024 512467 ns 514442 ns 1122 IdentifierCompleterFixture/CandidatesWithCommonPrefix/2048 1144025 ns 1143845 ns 641 IdentifierCompleterFixture/CandidatesWithCommonPrefix/4096 2643405 ns 2659108 ns 264 IdentifierCompleterFixture/CandidatesWithCommonPrefix/8192 6263122 ns 6267897 ns 112 IdentifierCompleterFixture/CandidatesWithCommonPrefix/16384 13002049 ns 12535795 ns 56 IdentifierCompleterFixture/CandidatesWithCommonPrefix/32768 28741026 ns 28704184 ns 25 IdentifierCompleterFixture/CandidatesWithCommonPrefix/65536 60231116 ns 60840390 ns 10 IdentifierCompleterFixture_BigO 57.59 NlgN 57.96 NlgN IdentifierCompleterFixture_RMS 1 % 2 % PythonSupportFixture/FilterAndSortUnstoredCandidatesWithCommonPrefix/1024 4280306 ns 4680030 ns 150 PythonSupportFixture/FilterAndSortUnstoredCandidatesWithCommonPrefix/2048 9249671 ns 9186726 ns 90 PythonSupportFixture/FilterAndSortUnstoredCandidatesWithCommonPrefix/4096 18832170 ns 18373451 ns 45 PythonSupportFixture/FilterAndSortUnstoredCandidatesWithCommonPrefix/8192 38904981 ns 37266906 ns 18 PythonSupportFixture/FilterAndSortUnstoredCandidatesWithCommonPrefix/16384 78318612 ns 78000500 ns 9 PythonSupportFixture/FilterAndSortUnstoredCandidatesWithCommonPrefix/32768 158404771 ns 163801050 ns 4 PythonSupportFixture/FilterAndSortUnstoredCandidatesWithCommonPrefix/65536 317453915 ns 319802050 ns 2 PythonSupportFixture_BigO 4836.93 N 4891.16 N PythonSupportFixture_RMS 1 % 2 % PythonSupportFixture/FilterAndSortStoredCandidatesWithCommonPrefix/1024 542299 ns 530403 ns 1000 PythonSupportFixture/FilterAndSortStoredCandidatesWithCommonPrefix/2048 1264767 ns 1279153 ns 561 PythonSupportFixture/FilterAndSortStoredCandidatesWithCommonPrefix/4096 2724379 ns 2718199 ns 264 PythonSupportFixture/FilterAndSortStoredCandidatesWithCommonPrefix/8192 6263308 ns 6267897 ns 112 PythonSupportFixture/FilterAndSortStoredCandidatesWithCommonPrefix/16384 12994143 ns 12792082 ns 50 PythonSupportFixture/FilterAndSortStoredCandidatesWithCommonPrefix/32768 27466364 ns 27300175 ns 28 PythonSupportFixture/FilterAndSortStoredCandidatesWithCommonPrefix/65536 57326742 ns 56727636 ns 11 PythonSupportFixture_BigO 54.99 NlgN 54.45 NlgN PythonSupportFixture_RMS 2 % 2 % ``` We see that filtering and sorting is much slower when candidates are not already stored so this is something we need to consider when trying to improve performance. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/818) <!-- Reviewable:end -->
Folks, feel free to close this if it reaches a good enough state. I've been stuck with the improvements due to my previous patch but couldn't move further on my fork due to conflicting changes with #2657 regarding parameter hints. As far as I've tested up to my previous patch I was satisfied enough already, despite not leveraging the new async system. |
No time for looking into Vim completion at the moment at all. |
@oblitum just to let you know we've implemented partial sorting in ycmd ;) |
@vheon that's great! |
Completing at global scope is now almost instant in cases 1 and 2 so we can definitely close this. |
Consider the following trivial c++ program:
Case 1
Automatic semantic completion just after the member access operator will work, but forced semantic completion just before it will not. This happens because YCM is unable to handle large numbers of identifiers.
Forced semantic completion before
.
makeslibclang
return global scope completion data.libclang
is very fast in doing so, YCM chokes. Two alternatives can happen:Things that always happen:
Completion at global scope is a very important use case, specially in C codebases where nearly everything lives at the global scope. For example:
Case 2
It's a very common use case to wish to have completion for the Windows API, which is all C functions. In this realm the most useful and expected thing that YCM could offer is completion of the API provided by the included header: it simply can't handle it, despite how ubiquitous
windows.h
is.Program 1 below was used to verify bare libclang timings for these usecases (with
LIBCLANG_TIMING=1
the timings can be compared with the internal libclang timings).For case 1 after
.
(#define flags linux_flags
):For case 1 before
.
(#define flags linux_flags
):For case 2 after
.
(#define flags windows_flags
):For case 2 before
.
(#define flags windows_flags
):Notice that reparsing has been added as I suspect my experience apparently conflicts with this comment:
Despite that, ~100ms is reasonably unnoticeable (but alarming compared to ~3ms), it's not libclang's fault. The
windows.h
use case can be verified from Linux if the tips in this blog post are followed.It's around ~35k identifiers that YCM starts to choke a lot.
I've experienced YCM slowness before on huge files, and I generally just use
vim -u NONE
for opening huge files to avoid that. Files that have no semantic completion at all, just identifier completion,like some
.sql
file.With this information in mind, I've created Program 2 to generate output with 60k identifiers. Saving this output to
identifiers.sql
and opening it with YCM, trying to edit at the end of the file the same problems will happen without any semantic completion:aaa
is typed, YCM will simply timeout [fixed by [READY] Rewrite completion system #2657]aaaaaa
is typed, YCM may be able to complete, but it'll be slow.Video demonstration:
System information:
Related issue:
Program 1
Program 2
The text was updated successfully, but these errors were encountered: