Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

French localisation: wrong encoding in ~/.viminfo. #8075

Open
romainl opened this issue Apr 6, 2021 · 10 comments
Open

French localisation: wrong encoding in ~/.viminfo. #8075

romainl opened this issue Apr 6, 2021 · 10 comments

Comments

@romainl
Copy link

romainl commented Apr 6, 2021

Describe the bug
In Vim with french localisation, the localised comments are not properly encoded.

To Reproduce
Detailed steps to reproduce the behavior:

  1. Run $ LANG=fr_FR.UTF-8 vim /tmp/foo.

  2. Do some editing in that file and write it.

  3. Run vim --clean ~/.viminfo.

  4. Move the cursor to line 7.

  5. Behold this trainwreck:

    # 'encoding' dans lequel ce fichier a été écrit
    *encoding=utf-8
    

    Here are all the problematic lines:

    # Ce fichier viminfo a été généré par Vim 8.2.
    # Vous pouvez l'éditer, mais soyez prudent.
    # 'encoding' dans lequel ce fichier a été écrit
    # Dernières chaînes de substitution :
    # Historique ligne de commande (chronologie décroissante) :
    # Historique chaîne de recherche (chronologie décroissante) :
    # Historique expression (chronologie décroissante) :
    # Historique ligne de saisie (chronologie décroissante) :
    # Historique Ligne de débogage (chronologie décroissante) :
    # Liste de sauts (le plus récent en premier) :
    # Historique des marques dans les fichiers (les plus récentes en premier) :
    

Expected behavior
The comments should be encoded properly to look like this:

# 'encoding' dans lequel ce fichier a été écrit
*encoding=utf-8

Environment:

  • Vim version 8.2.2576
  • OS: MacOS 10.14.6
  • Terminal: irrelevant
  • :echo &encoding prints utf-8.

Additional context
The maintainer of the french localisation seems unresponsive.

@tonymec
Copy link

tonymec commented Apr 6, 2021

Does your 'viminfo' setting include the c flag?

If it doesn't, do you se a difference if you add

set vi+=c

in your vimrc?

Best regards,
Tony.

@romainl
Copy link
Author

romainl commented Apr 6, 2021

No, there is no difference. The current encoding and the one used to write ~/.viminfo are the same, utf-8, so there shouldn't be any difference anyway.

@gdupras
Copy link

gdupras commented Apr 6, 2021

If you start Vim with --clean, it implies -i NONE, which means the viminfo file is not read nor written. You should modify your first step.

Anyway, I have the same issue with LANG=fr_CA.UTF-8 on Kubuntu 20.10. If I open .viminfo and do :set fileencoding?, it prints latin1. If I open the file with :edit ++enc=utf-8 .viminfo, the text looks fine. I then wrote the file, opened it again, and the encoding was utf-8, as it should.

I also tried creating a new viminfo file with the following steps and the encoding of the file is correctly set to utf-8.

  1. mv .viminfo .viminfo_orig

  2. vim -Nu DEFAULTS /tmp/foo

  3. Edit and write that file. Quit.

  4. vim -Nu DEFAULTS .viminfo

  5. Encoding looks fine.

So how did the viminfo file end up with the latin1 encoding?

@romainl
Copy link
Author

romainl commented Apr 6, 2021

Related: why are so many files under src/po/ in ISO-8859-1 when Vim is supposed to use UTF-8 internally? This is rather confusing.

@brammool
Copy link
Contributor

brammool commented Apr 6, 2021

Just historic reasons. French fits perfection in latin1 and the file encoding should be recognized automatically.
It saves a few bytes but that is hardly relevant.

@romainl
Copy link
Author

romainl commented Apr 6, 2021

Alright.

After converting ~/.viminfo to UTF-8 via ++enc=utf-8, the comments are no longer improperly encoded.

So I had a hunch that somehow the file had some latin1 characters in it that forced the thing to be recognised as latin1 and I ran a little bissection experiment with a backup of my original ~/.viminfo until I found the culprit:

culprit

The ý is what appears to be forcing the whole buffer to be encoded in latin1, which presumably caused the issue.

Except I never pressed that key during that recording.

I am somewhat used to see those mysterious <80> (0x80) littering my recordings but that 0xfd following a <80> is new to me. It looks like one of my keystrokes produced that character.

@gdupras
Copy link

gdupras commented Apr 6, 2021

French fits perfection in latin1

According to this Wikipedia page, it's missing œ, Œ, and Ÿ. The first two are common in French, in words like cœur, sœur, œuf, etc. I think we should switch to UTF-8 for French.

@romainl
Copy link
Author

romainl commented Apr 6, 2021

@ProgMetalSlug you beat me to it.

@brammool
Copy link
Contributor

brammool commented Apr 7, 2021 via email

@romainl
Copy link
Author

romainl commented Apr 7, 2021

@brammool I am well aware of the risk, which is why I never edited it. I'm not sure why you bring this up.

There was no manual editing involved at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants