Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError when current line contains multibyte character? #1378

Closed
timfeirg opened this issue Feb 13, 2015 · 16 comments · Fixed by ycm-core/ycmd#455
Closed

IndexError when current line contains multibyte character? #1378

timfeirg opened this issue Feb 13, 2015 · 16 comments · Fixed by ycm-core/ycmd#455

Comments

@timfeirg
Copy link

when I try to get completion for function get_tot_click, in a line that contains chinese characters, the following error will arise:

IndexError: string index out of range

_mosh__vi_calc_sales_detail_py

and if I move the cursor out of that line, it'll work:

succ


Am I posting a duplicate issue? I've read #278, #590, #788, this issue is about utf-8 characters causing autocomplete failure for ASCII names.

@simnalamburt
Copy link

👍

@vheon
Copy link
Contributor

vheon commented Feb 15, 2015

@timfeirg can you post a little test case? Images are bad for copy and paste :P

@simnalamburt
Copy link

@vheon Try below Nevermind, in my case ^C^Ving the texts doesn't reproduce error (It reproduced when I write it manually)

First you should write somthing
And then multibyte char will cause error
이걸복사해서붙여넣어보시오

@timfeirg
Copy link
Author

yes, sorry, the test case:

  • create test.py
  • copy & paste the following python code (using set paste and set nopaste after):
# -*- coding: utf-8 -*-
def one_special_function():
    return 23

a = {}
a['中文'] = 
a['ascii'] = 
  • now, I open the above file in vim, try to call our special function in in the 5th row and assign the returned value to a['中文'], YCM didn't offer any suggestions when I entered onespe, I keep typing and entered onespeci, the function name appears as the last candidate (preceded by a whole bunch of mismatch) in YCM autocomplete window.
  • and I try to do the same with the 7th row: I type one and one_special_function pops up as the first candidate.

And for @simnalamburt 's information, In my case copy & paste would reproduce such error, I wouldn't know if it's set paste that cause vim or YCM to treat the text differently.

@Valloric
Copy link
Member

@timfeirg From your test case, I can repro the problem of getting a whole bunch of mis-matched completions, but I don't get an IndexError.

WRT copy/pasting and IndexError, this might be related to whatever is set as your current file encoding in Vim.

@timfeirg
Copy link
Author

I've add the shebang line in my test case, it's utf-8, I think if you do the same you'll reproduce the problem.

So it is because of my file encoding, any suggestions what I can do to make YCM work?

@oblitum
Copy link
Contributor

oblitum commented Jul 15, 2015

I'm getting IndexError and others related to unicode characters in buffer too, this should have been introduced somewhat recently since I only noticed this after bumping my fork after some weeks staled.

@timokau
Copy link

timokau commented Nov 7, 2015

I'm having the same problems with the german Symbols ä, ü, ß

@wsdjeg
Copy link

wsdjeg commented Nov 7, 2015

hi @timfeirg I have test you case ,has no error,also i have test my Jave code, also no error
2015-11-07 22-14-24

@timfeirg
Copy link
Author

timfeirg commented Nov 8, 2015

I'm on 0352ed9 and still can reproduce. @wsdjeg can you post your YCM version as well as your vim dot-file?

1
2

@wsdjeg
Copy link

wsdjeg commented Nov 8, 2015

i just updated to the latest version of YCM,and here is my ycm . you just need to have a look at part of autocomplete

@wsdjeg
Copy link

wsdjeg commented Nov 8, 2015

@timfeirg
Copy link
Author

timfeirg commented Nov 8, 2015

thanks @wsdjeg , your vimrc has been educating, but I'm having trouble identifying the lines that fix this particular problem.

I've tried copying the lines that touched YCM config & apply them to my own vimrc, didn't work, and it seems that they weren't related to character encoding anyway.

in case you'd like to take a look, here's my vimrc

@wsdjeg
Copy link

wsdjeg commented Nov 8, 2015

i got the same issue
2015-11-08 22-48-30
2015-11-08 22-48-13

@wsdjeg
Copy link

wsdjeg commented Nov 8, 2015

if i delete the chinese words it works well
2015-11-08 22-50-12

@wsdjeg
Copy link

wsdjeg commented Nov 8, 2015

but if only one chinese char,it also works well
2015-11-08 22-54-09

micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
micbou added a commit to micbou/YouCompleteMe that referenced this issue Mar 23, 2016
homu added a commit to ycm-core/ycmd that referenced this issue Apr 24, 2016
[READY] Fix issues with multi-byte characters

## Summary

This change introduces more general support for non-ASCII characters in buffers handled by YCMD.

In ycmd's public API, all offsets are byte offsets into the UTF-8 encoded buffers. We also assume (because, we have no other choice) that files stored on disk are also UTF-8 encoded. Internally, almost all of ycmd's functionality operates on unicode strings (python 2 `unicode()` and python 3 `str()` objects, transparently via `future`). Many of the downstream completion engines expect unicode code points as the offsets in their APIs. One special case is the `ycm_core` library (identifier completer and clang completer), which requires instances of the _native_ `str` type. All strings used within the c++ using `boost::python` require passing through `ToCppStringCompatible`

Previously, we were largely just assuming that `code point == byte offset` - i.e. all buffers contained only ASCII characters. This worked up to a point, but more by luck than judgement in a number of places.

## References

In combination with a YCM change and PR #453, I hope this:

- fixes #109
- fixes ycm-core/YouCompleteMe#2096
- fixes ycm-core/YouCompleteMe#2088
- fixes ycm-core/YouCompleteMe#2069
- fixes ycm-core/YouCompleteMe#2066
- fixes ycm-core/YouCompleteMe#1378

## Overview of changes

The changes fall into the following areas:

- Providing access to and conversion to/from code points and byte offsets (`request_wrap.py`)
- Changing certain algorithms/features to work entirely in codepoint space when they are trying to operate on logical 'characters' within the buffer (see known issues for why this isn't perfect, but probably most of the way there)
- Changing the completers to convert between the external (on both sides) and internal representations by using the shortcuts provided in `request_wrap.py`
- Adding tests for each of the completers for both completions and subcommands

## Completer-specific notes

Pretty much all of the completers I tested required some changes:
- clang uses utf-8 and byte offsets, but had some bugs with the `GetDoc` parsing stuff
- OmniSharp speaks codepoint offsets
- Tern speaks codepoint offsets
- JediHTTP speaks codepoint offsets
- tsserver speaks codepoint offsets
- gocode speaks byte offsets
- racer i did not test

## Further work / Known issues

- we act blissfully ignorant of the case where a unicode character consumes multiple code points (such as where there is a modifier after the code point)
- when typing a unicode character, we still get an exception from `bitset` (see #453 for that fix)
- the filtering and sorting system is 100% designed for ASCII only, and it is not in the scope of this PR to change that. Currently after any filtering operation, words containing non-ASCII characters are excluded.
- I did not get round to testing rust using racer
- there are further changes required to YouCompleteMe client (a further PR is coming for that)

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/valloric/ycmd/455)
<!-- Reviewable:end -->
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
7 participants