IndexError when current line contains multibyte character? #1378

timfeirg · 2015-02-13T17:07:02Z

when I try to get completion for function get_tot_click, in a line that contains chinese characters, the following error will arise:

IndexError: string index out of range

and if I move the cursor out of that line, it'll work:

Am I posting a duplicate issue? I've read #278, #590, #788, this issue is about utf-8 characters causing autocomplete failure for ASCII names.

The text was updated successfully, but these errors were encountered:

simnalamburt · 2015-02-15T23:06:08Z

👍

vheon · 2015-02-15T23:28:56Z

@timfeirg can you post a little test case? Images are bad for copy and paste :P

simnalamburt · 2015-02-15T23:46:56Z

@vheon ~~Try below~~ Nevermind, in my case ^C^Ving the texts doesn't reproduce error (It reproduced when I write it manually)

First you should write somthing
And then multibyte char will cause error
이걸복사해서붙여넣어보시오

timfeirg · 2015-02-16T16:50:53Z

yes, sorry, the test case:

create test.py
copy & paste the following python code (using set paste and set nopaste after):

# -*- coding: utf-8 -*-
def one_special_function():
    return 23

a = {}
a['中文'] = 
a['ascii'] =

now, I open the above file in vim, try to call our special function in in the 5th row and assign the returned value to a['中文'], YCM didn't offer any suggestions when I entered onespe, I keep typing and entered onespeci, the function name appears as the last candidate (preceded by a whole bunch of mismatch) in YCM autocomplete window.
and I try to do the same with the 7th row: I type one and one_special_function pops up as the first candidate.

And for @simnalamburt 's information, In my case copy & paste would reproduce such error, I wouldn't know if it's set paste that cause vim or YCM to treat the text differently.

Valloric · 2015-03-27T23:59:21Z

@timfeirg From your test case, I can repro the problem of getting a whole bunch of mis-matched completions, but I don't get an IndexError.

WRT copy/pasting and IndexError, this might be related to whatever is set as your current file encoding in Vim.

timfeirg · 2015-03-28T06:58:42Z

I've add the shebang line in my test case, it's utf-8, I think if you do the same you'll reproduce the problem.

So it is because of my file encoding, any suggestions what I can do to make YCM work?

oblitum · 2015-07-15T02:19:19Z

I'm getting IndexError and others related to unicode characters in buffer too, this should have been introduced somewhat recently since I only noticed this after bumping my fork after some weeks staled.

timokau · 2015-11-07T12:17:17Z

I'm having the same problems with the german Symbols ä, ü, ß

wsdjeg · 2015-11-07T14:16:06Z

hi @timfeirg I have test you case ,has no error,also i have test my Jave code, also no error

timfeirg · 2015-11-08T04:35:45Z

I'm on 0352ed9 and still can reproduce. @wsdjeg can you post your YCM version as well as your vim dot-file?

wsdjeg · 2015-11-08T04:56:20Z

i just updated to the latest version of YCM,and here is my ycm . you just need to have a look at part of autocomplete

wsdjeg · 2015-11-08T04:56:28Z

https://github.com/wsdjeg/DotFiles/blob/develop/vimrc

timfeirg · 2015-11-08T14:12:29Z

thanks @wsdjeg , your vimrc has been educating, but I'm having trouble identifying the lines that fix this particular problem.

I've tried copying the lines that touched YCM config & apply them to my own vimrc, didn't work, and it seems that they weren't related to character encoding anyway.

in case you'd like to take a look, here's my vimrc

wsdjeg · 2015-11-08T14:49:31Z

i got the same issue

wsdjeg · 2015-11-08T14:51:11Z

if i delete the chinese words it works well

wsdjeg · 2015-11-08T14:56:25Z

but if only one chinese char,it also works well

See issue ycm-core#1378.

Fix ycm-core#1378.

See issue ycm-core#1378.

Fix ycm-core#1378.

See issue ycm-core#1378.

Fix ycm-core#1378.

[READY] Fix issues with multi-byte characters ## Summary This change introduces more general support for non-ASCII characters in buffers handled by YCMD. In ycmd's public API, all offsets are byte offsets into the UTF-8 encoded buffers. We also assume (because, we have no other choice) that files stored on disk are also UTF-8 encoded. Internally, almost all of ycmd's functionality operates on unicode strings (python 2 `unicode()` and python 3 `str()` objects, transparently via `future`). Many of the downstream completion engines expect unicode code points as the offsets in their APIs. One special case is the `ycm_core` library (identifier completer and clang completer), which requires instances of the _native_ `str` type. All strings used within the c++ using `boost::python` require passing through `ToCppStringCompatible` Previously, we were largely just assuming that `code point == byte offset` - i.e. all buffers contained only ASCII characters. This worked up to a point, but more by luck than judgement in a number of places. ## References In combination with a YCM change and PR #453, I hope this: - fixes #109 - fixes ycm-core/YouCompleteMe#2096 - fixes ycm-core/YouCompleteMe#2088 - fixes ycm-core/YouCompleteMe#2069 - fixes ycm-core/YouCompleteMe#2066 - fixes ycm-core/YouCompleteMe#1378 ## Overview of changes The changes fall into the following areas: - Providing access to and conversion to/from code points and byte offsets (`request_wrap.py`) - Changing certain algorithms/features to work entirely in codepoint space when they are trying to operate on logical 'characters' within the buffer (see known issues for why this isn't perfect, but probably most of the way there) - Changing the completers to convert between the external (on both sides) and internal representations by using the shortcuts provided in `request_wrap.py` - Adding tests for each of the completers for both completions and subcommands ## Completer-specific notes Pretty much all of the completers I tested required some changes: - clang uses utf-8 and byte offsets, but had some bugs with the `GetDoc` parsing stuff - OmniSharp speaks codepoint offsets - Tern speaks codepoint offsets - JediHTTP speaks codepoint offsets - tsserver speaks codepoint offsets - gocode speaks byte offsets - racer i did not test ## Further work / Known issues - we act blissfully ignorant of the case where a unicode character consumes multiple code points (such as where there is a modifier after the code point) - when typing a unicode character, we still get an exception from `bitset` (see #453 for that fix) - the filtering and sorting system is 100% designed for ASCII only, and it is not in the scope of this PR to change that. Currently after any filtering operation, words containing non-ASCII characters are excluded. - I did not get round to testing rust using racer - there are further changes required to YouCompleteMe client (a further PR is coming for that)  --- This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/455)

Valloric mentioned this issue Apr 16, 2015

Incorrect completions with leading multi-byte chars in a C++ comment ycm-core/ycmd#109

Closed

timokau mentioned this issue Nov 7, 2015

Error inserting german symbols lervag/vimtex#267

Closed

oblitum mentioned this issue Jan 7, 2016

Error with non-ascii character in file name ycm-core/ycmd#177

Closed

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Add completion test on multibyteline

f24a245

See issue ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Send column as codepoint index to ycmd

ed2cea9

Fix ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Send column as codepoint index to ycmd

60a956b

Fix ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Send column as codepoint index to ycmd

0f3fc8e

Fix ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Add completion test on multibyteline

9b7f0c6

See issue ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Send column as codepoint index to ycmd

15dae06

Fix ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Send column as codepoint index to ycmd

1a04ad4

Fix ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Add completion test on multibyte line

028508a

See issue ycm-core#1378.

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016

Send column as codepoint index to ycmd

0092603

Fix ycm-core#1378.

This was referenced Mar 23, 2016

[WIP] Send column as codepoint index to ycmd #2073

Closed

[WIP] Assume column number in request is codepoint indexed ycm-core/ycmd#439

Closed

puremourning mentioned this issue Apr 9, 2016

[READY] Fix issues with multi-byte characters ycm-core/ycmd#455

Merged

homu closed this as completed in ycm-core/ycmd#455 Apr 24, 2016

github-actions bot locked as resolved and limited conversation to collaborators May 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError when current line contains multibyte character? #1378

IndexError when current line contains multibyte character? #1378

timfeirg commented Feb 13, 2015

simnalamburt commented Feb 15, 2015

vheon commented Feb 15, 2015

simnalamburt commented Feb 15, 2015

timfeirg commented Feb 16, 2015

Valloric commented Mar 27, 2015

timfeirg commented Mar 28, 2015

oblitum commented Jul 15, 2015

timokau commented Nov 7, 2015

wsdjeg commented Nov 7, 2015

timfeirg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

timfeirg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

IndexError when current line contains multibyte character? #1378

IndexError when current line contains multibyte character? #1378

Comments

timfeirg commented Feb 13, 2015

simnalamburt commented Feb 15, 2015

vheon commented Feb 15, 2015

simnalamburt commented Feb 15, 2015

timfeirg commented Feb 16, 2015

Valloric commented Mar 27, 2015

timfeirg commented Mar 28, 2015

oblitum commented Jul 15, 2015

timokau commented Nov 7, 2015

wsdjeg commented Nov 7, 2015

timfeirg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

timfeirg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015

wsdjeg commented Nov 8, 2015