-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[READY] Fix IndexError exception from C++ #453
Conversation
3613041
to
dc3ecd1
Compare
Reviewing the code a bit I find a number of other places where the code assumes that each For example in std::list< LetterNode * > &LetterNodeListMap::operator[] ( char letter ) {
int letter_index = IndexForChar( letter );
std::list< LetterNode * > *list = letters_[ letter_index ];
...
}
...
std::list< LetterNode * > *LetterNodeListMap::ListPointerAt( char letter ) {
return letters_[ IndexForChar( letter ) ];
}
...
bool LetterNodeListMap::HasLetter( char letter ) const {
return letters_[ IndexForChar( letter ) ] != NULL;
} according to the docs this could breach the requirement that I suspect that what is happening is that this is always reading or writing past the end of the array, which could lead to Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions. cpp/ycm/Candidate.cpp, line 69 [r1] (raw file): cpp/ycm/Candidate.cpp, line 72 [r1] (raw file): There is a sort of old-school fallacy that in order to prevent accidental use of the assignment operator, programmers would (heinously IMO) obfuscate their code with Anyway, I always feel like the variable (lvalue) should be of the left, because we normally say "is this apple red" not "is red this apple". Comments from Reviewable |
dc3ecd1
to
83b91d1
Compare
Good catch. I'll try to write tests for these functions. Reviewed 1 of 2 files at r1. cpp/ycm/Candidate.cpp, line 69 [r1] (raw file): cpp/ycm/Candidate.cpp, line 72 [r1] (raw file): Comments from Reviewable |
83b91d1
to
3ca5e3c
Compare
Review status: 0 of 2 files reviewed at latest revision, 3 unresolved discussions. cpp/ycm/Candidate.cpp, line 70 [r3] (raw file): Comments from Reviewable |
I think this fixes ycm-core/YouCompleteMe#278 too (along with my upcoming PR for unicode) |
This fixes the Reviewed 1 of 2 files at r1, 1 of 1 files at r3. Comments from Reviewable |
Right, the combination of the 2 should fix it, though I think :) Reviewed 2 of 2 files at r1, 1 of 1 files at r3. Comments from Reviewable |
I mean, even with our fixes, completing filenames with non-ascii characters will not be possible and I think this is what issue ycm-core/YouCompleteMe#278 is about. Review status: all files reviewed at latest revision, 3 unresolved discussions. Comments from Reviewable |
Well the report it self talks about fixing the Review status: all files reviewed at latest revision, 1 unresolved discussion. Comments from Reviewable |
from me if @puremourning is fine with it too. I think he's now officially the unicode expert as it pertains to ycmd. :) Review status: all files reviewed at latest revision, 2 unresolved discussions. cpp/ycm/Candidate.cpp, line 70 [r3] (raw file): I'm fine with leaving it the way you have it too. cpp/ycm/tests/LetterBitsetFromString_test.cpp, line 60 [r3] (raw file): Comments from Reviewable |
3ca5e3c
to
c07d4c5
Compare
c07d4c5
to
db27875
Compare
So I added tests for the As expected, the non-ascii test was failing by returning out of bounds pointers instead of a NULL pointer. Fixed by checking that the letter index is in range in Reviewed 2 of 4 files at r5, 4 of 4 files at r6. cpp/ycm/Candidate.cpp, line 70 [r3] (raw file): cpp/ycm/LetterNodeListMap.cpp, line 30 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 35 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 42 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 48 [r6] (raw file): cpp/ycm/tests/LetterBitsetFromString_test.cpp, line 60 [r3] (raw file): Comments from Reviewable |
Reviewed 2 of 4 files at r5, 2 of 4 files at r6, 1 of 1 files at r7. cpp/ycm/LetterNodeListMap.cpp, line 71 [r7] (raw file): As it happens, I think it is reasonable to have a precondition that cpp/ycm/LetterNodeListMap.h, line 35 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 42 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 48 [r6] (raw file): cpp/ycm/tests/LetterNode_test.cpp, line 30 [r7] (raw file): OK a small memory leak in our test isn't a big deal, but it means the destructor code (which is still code we should test) isn't run. cpp/ycm/tests/LetterNode_test.cpp, line 65 [r7] (raw file): Comments from Reviewable |
Reviewed 1 of 4 files at r6. Comments from Reviewable |
Review status: all files reviewed at latest revision, 10 unresolved discussions. cpp/ycm/Candidate.cpp, line 70 [r3] (raw file): cpp/ycm/LetterNode.cpp, line 30 [r7] (raw file): cpp/ycm/LetterNodeListMap.cpp, line 30 [r6] (raw file): The formatting works here just as well. cpp/ycm/LetterNodeListMap.cpp, line 71 [r7] (raw file): I'm not against having an assert. cpp/ycm/LetterNodeListMap.h, line 35 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 42 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 48 [r6] (raw file): cpp/ycm/tests/LetterBitsetFromString_test.cpp, line 60 [r3] (raw file): cpp/ycm/tests/LetterNode_test.cpp, line 30 [r7] (raw file): And I agree, this shouldn't be on the heap. Comments from Reviewable |
Test ASCII boundaries and unicode characters in LetterBitsetFromString function.
db27875
to
4892b21
Compare
4892b21
to
a281202
Compare
Reviewed 1 of 1 files at r7, 6 of 7 files at r8. cpp/ycm/Candidate.cpp, line 70 [r3] (raw file): cpp/ycm/LetterNodeListMap.cpp, line 71 [r7] (raw file): cpp/ycm/LetterNodeListMap.h, line 42 [r6] (raw file): cpp/ycm/LetterNodeListMap.h, line 48 [r6] (raw file): cpp/ycm/tests/LetterNode_test.cpp, line 30 [r7] (raw file): cpp/ycm/tests/LetterNode_test.cpp, line 65 [r7] (raw file): Comments from Reviewable |
Should we bump the core version for these changes? Review status: 7 of 8 files reviewed at latest revision, 7 unresolved discussions. Comments from Reviewable |
Yes, core version should be bumped. Review status: 7 of 8 files reviewed at latest revision, 6 unresolved discussions. cpp/ycm/LetterNodeListMap.cpp, line 71 [r7] (raw file): Comments from Reviewable |
Version bumped. Reviewed 1 of 1 files at r9, 1 of 1 files at r10. cpp/ycm/LetterNodeListMap.cpp, line 71 [r7] (raw file): If we want to be extra cautious, we could add the Comments from Reviewable |
Awesome stuff! Reviewed 1 of 4 files at r5, 1 of 1 files at r7, 6 of 7 files at r8, 1 of 1 files at r9, 1 of 1 files at r10. cpp/ycm/LetterNodeListMap.cpp, line 71 [r7] (raw file): Comments from Reviewable |
Reviewed 1 of 4 files at r5, 1 of 1 files at r7, 6 of 7 files at r8, 1 of 1 files at r9. Comments from Reviewable |
Reviewed 6 of 7 files at r8, 1 of 1 files at r9, 1 of 1 files at r10. cpp/ycm/LetterNodeListMap.h, line 35 [r6] (raw file): Comments from Reviewable |
Thanks for the PR! :) @homu r+ Review status: all files reviewed at latest revision, all discussions resolved. Comments from Reviewable |
📌 Commit f58c41c has been approved by |
[READY] Fix IndexError exception from C++ This PR fixes the `IndexError` exception returned by the `FilterAndCandidatesWrap` function when a candidate or the query contains non-ascii characters. Its error message depends on the OS: - on Windows: `invalid bitset<N> position` - on Linux: `bitset::set: __position (which is xxx) >= _Nb (which is 128)` - on OS X: `bitset set argument out of range` It is caused by the `LetterBitsetFromString` function trying to set a bit to the `Bitset` object with a character index that is out of range. This is solved by only setting bits for character indices satisfying `0 ≤ i < 128`. We add two tests for the `LetterBitsetFromString` function: one test to check the lower and upper character bounds and another to check that non-ascii characters are ignored. These tests are failing without this change. For example, on Windows: ``` [ RUN ] LetterBitsetFromStringTest.Boundaries unknown file: error: C++ exception with description "invalid bitset<N> position" thrown in the test body. [ FAILED ] LetterBitsetFromStringTest.Boundaries (1 ms) [ RUN ] LetterBitsetFromStringTest.IgnoreNonAsciiCharacters unknown file: error: C++ exception with description "invalid bitset<N> position" thrown in the test body. [ FAILED ] LetterBitsetFromStringTest.IgnoreNonAsciiCharacters (0 ms) ``` Fixes #177 and ycm-core/YouCompleteMe#2103. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/453) <!-- Reviewable:end -->
☀️ Test successful - status |
[READY] Definitely fix IndexError exception from C++ PR #453 did not completely fix the IndexError issue, but moved it to other parts of the C++ code. The exception is now raised by the [`NodeListForLetter` method in `QueryMatchResult`](https://github.com/Valloric/ycmd/blob/master/cpp/ycm/Candidate.cpp#L96) when the query contains non-ascii characters. Since we don't support this kind of query, we want to deal with it as early as possible: in the `CandidatesForQueryAndType` and `FilterAndSortCandidates` functions. We don't check for non-ascii characters but for nonprintable ones as [we already do for candidates](https://github.com/Valloric/ycmd/blob/master/cpp/ycm/CandidateRepository.cpp#L136). Added two tests for the `CandidatesForQuery` function: `EmptyCandidatesForUnicode` is failing without this PR but not `EmptyCandidatesForNonPrintable` because nonprintable candidates are ignored. No tests for `FilterAndSortCandidates` because it will be tested by the Python layer in PR #455. Core version bumped. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/463) <!-- Reviewable:end -->
[READY] Fix issues with multi-byte characters ## Summary This change introduces more general support for non-ASCII characters in buffers handled by YCMD. In ycmd's public API, all offsets are byte offsets into the UTF-8 encoded buffers. We also assume (because, we have no other choice) that files stored on disk are also UTF-8 encoded. Internally, almost all of ycmd's functionality operates on unicode strings (python 2 `unicode()` and python 3 `str()` objects, transparently via `future`). Many of the downstream completion engines expect unicode code points as the offsets in their APIs. One special case is the `ycm_core` library (identifier completer and clang completer), which requires instances of the _native_ `str` type. All strings used within the c++ using `boost::python` require passing through `ToCppStringCompatible` Previously, we were largely just assuming that `code point == byte offset` - i.e. all buffers contained only ASCII characters. This worked up to a point, but more by luck than judgement in a number of places. ## References In combination with a YCM change and PR #453, I hope this: - fixes #109 - fixes ycm-core/YouCompleteMe#2096 - fixes ycm-core/YouCompleteMe#2088 - fixes ycm-core/YouCompleteMe#2069 - fixes ycm-core/YouCompleteMe#2066 - fixes ycm-core/YouCompleteMe#1378 ## Overview of changes The changes fall into the following areas: - Providing access to and conversion to/from code points and byte offsets (`request_wrap.py`) - Changing certain algorithms/features to work entirely in codepoint space when they are trying to operate on logical 'characters' within the buffer (see known issues for why this isn't perfect, but probably most of the way there) - Changing the completers to convert between the external (on both sides) and internal representations by using the shortcuts provided in `request_wrap.py` - Adding tests for each of the completers for both completions and subcommands ## Completer-specific notes Pretty much all of the completers I tested required some changes: - clang uses utf-8 and byte offsets, but had some bugs with the `GetDoc` parsing stuff - OmniSharp speaks codepoint offsets - Tern speaks codepoint offsets - JediHTTP speaks codepoint offsets - tsserver speaks codepoint offsets - gocode speaks byte offsets - racer i did not test ## Further work / Known issues - we act blissfully ignorant of the case where a unicode character consumes multiple code points (such as where there is a modifier after the code point) - when typing a unicode character, we still get an exception from `bitset` (see #453 for that fix) - the filtering and sorting system is 100% designed for ASCII only, and it is not in the scope of this PR to change that. Currently after any filtering operation, words containing non-ASCII characters are excluded. - I did not get round to testing rust using racer - there are further changes required to YouCompleteMe client (a further PR is coming for that) <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/455) <!-- Reviewable:end -->
This PR fixes the
IndexError
exception returned by theFilterAndCandidatesWrap
function when a candidate or the query contains non-ascii characters. Its error message depends on the OS:invalid bitset<N> position
bitset::set: __position (which is xxx) >= _Nb (which is 128)
bitset set argument out of range
It is caused by the
LetterBitsetFromString
function trying to set a bit to theBitset
object with a character index that is out of range. This is solved by only setting bits for character indices satisfying0 ≤ i < 128
.We add two tests for the
LetterBitsetFromString
function: one test to check the lower and upper character bounds and another to check that non-ascii characters are ignored. These tests are failing without this change. For example, on Windows:Fixes #177 and ycm-core/YouCompleteMe#2103.
This change is