-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError when current line contains multibyte character? #1378
Comments
👍 |
@timfeirg can you post a little test case? Images are bad for copy and paste :P |
@vheon
|
yes, sorry, the test case:
# -*- coding: utf-8 -*-
def one_special_function():
return 23
a = {}
a['中文'] =
a['ascii'] =
And for @simnalamburt 's information, In my case copy & paste would reproduce such error, I wouldn't know if it's |
@timfeirg From your test case, I can repro the problem of getting a whole bunch of mis-matched completions, but I don't get an IndexError. WRT copy/pasting and IndexError, this might be related to whatever is set as your current file encoding in Vim. |
I've add the shebang line in my test case, it's So it is because of my file encoding, any suggestions what I can do to make YCM work? |
I'm getting IndexError and others related to unicode characters in buffer too, this should have been introduced somewhat recently since I only noticed this after bumping my fork after some weeks staled. |
I'm having the same problems with the german Symbols |
hi @timfeirg I have test you case ,has no error,also i have test my Jave code, also no error |
i just updated to the latest version of YCM,and here is my ycm . you just need to have a look at part of autocomplete |
thanks @wsdjeg , your I've tried copying the lines that touched YCM config & apply them to my own vimrc, didn't work, and it seems that they weren't related to character encoding anyway. in case you'd like to take a look, here's my vimrc |
[READY] Fix issues with multi-byte characters ## Summary This change introduces more general support for non-ASCII characters in buffers handled by YCMD. In ycmd's public API, all offsets are byte offsets into the UTF-8 encoded buffers. We also assume (because, we have no other choice) that files stored on disk are also UTF-8 encoded. Internally, almost all of ycmd's functionality operates on unicode strings (python 2 `unicode()` and python 3 `str()` objects, transparently via `future`). Many of the downstream completion engines expect unicode code points as the offsets in their APIs. One special case is the `ycm_core` library (identifier completer and clang completer), which requires instances of the _native_ `str` type. All strings used within the c++ using `boost::python` require passing through `ToCppStringCompatible` Previously, we were largely just assuming that `code point == byte offset` - i.e. all buffers contained only ASCII characters. This worked up to a point, but more by luck than judgement in a number of places. ## References In combination with a YCM change and PR #453, I hope this: - fixes #109 - fixes ycm-core/YouCompleteMe#2096 - fixes ycm-core/YouCompleteMe#2088 - fixes ycm-core/YouCompleteMe#2069 - fixes ycm-core/YouCompleteMe#2066 - fixes ycm-core/YouCompleteMe#1378 ## Overview of changes The changes fall into the following areas: - Providing access to and conversion to/from code points and byte offsets (`request_wrap.py`) - Changing certain algorithms/features to work entirely in codepoint space when they are trying to operate on logical 'characters' within the buffer (see known issues for why this isn't perfect, but probably most of the way there) - Changing the completers to convert between the external (on both sides) and internal representations by using the shortcuts provided in `request_wrap.py` - Adding tests for each of the completers for both completions and subcommands ## Completer-specific notes Pretty much all of the completers I tested required some changes: - clang uses utf-8 and byte offsets, but had some bugs with the `GetDoc` parsing stuff - OmniSharp speaks codepoint offsets - Tern speaks codepoint offsets - JediHTTP speaks codepoint offsets - tsserver speaks codepoint offsets - gocode speaks byte offsets - racer i did not test ## Further work / Known issues - we act blissfully ignorant of the case where a unicode character consumes multiple code points (such as where there is a modifier after the code point) - when typing a unicode character, we still get an exception from `bitset` (see #453 for that fix) - the filtering and sorting system is 100% designed for ASCII only, and it is not in the scope of this PR to change that. Currently after any filtering operation, words containing non-ASCII characters are excluded. - I did not get round to testing rust using racer - there are further changes required to YouCompleteMe client (a further PR is coming for that) <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/455) <!-- Reviewable:end -->
when I try to get completion for function
get_tot_click
, in a line that contains chinese characters, the following error will arise:and if I move the cursor out of that line, it'll work:
Am I posting a duplicate issue? I've read #278, #590, #788, this issue is about utf-8 characters causing autocomplete failure for ASCII names.
The text was updated successfully, but these errors were encountered: