Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode file names aren't recognized #1837

Closed
vToMy opened this issue Aug 19, 2018 · 24 comments
Closed

Unicode file names aren't recognized #1837

vToMy opened this issue Aug 19, 2018 · 24 comments
Labels

Comments

@vToMy
Copy link

vToMy commented Aug 19, 2018

Running ctags on Unicode file names fails to open them.

Example
For a file called:
こんにちは世界.txt
Running:
ctags --options=NONE *
Will produce:

ctags: Notice: No options will be read from files or environment
ctags: Warning: cannot open input file "???????.txt" : No such file or directory
@hadrielk
Copy link
Contributor

Works for me:

$ ls
こんにちは世界.adoc

$ ../ctags --options=NONE -f - *
ctags: Notice: No options will be read from files or environment
Chapter 1 (Level 0)	こんにちは世界.adoc	/^= Chapter 1 (Level 0)$/;"	c
Level 3 Section 1.1.1.1 Title	こんにちは世界.adoc	/^==== Level 3 Section 1.1.1.1 Title$/;"	t	subsection:Chapter 1 (Level 0).Section 1.1.Subsection 1.1.1
Level 4 Section 1.1.1.1.1 Title	こんにちは世界.adoc	/^===== Level 4 Section 1.1.1.1.1 Title$/;"	T	subsubsection:Chapter 1 (Level 0).Section 1.1.Subsection 1.1.1.Level 3 Section 1.1.1.1 Title
Section 1.1	こんにちは世界.adoc	/^== Section 1.1$/;"	s	chapter:Chapter 1 (Level 0)
Subsection 1.1.1	こんにちは世界.adoc	/^=== Subsection 1.1.1$/;"	S	section:Chapter 1 (Level 0).Section 1.1

What operating system are you running on, and what version ctags?

@vToMy
Copy link
Author

vToMy commented Aug 19, 2018

OS: Windows 10 (Version 10.0.17134 Build 17134)
ctags version:

>ctags --version
Universal Ctags 0.0.0(2258b24b), Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Aug 18 2018, 00:09:59
  URL: https://ctags.io/
  Optional compiled features: +win32, +wildcards, +regex, +internal-sort, +iconv, +option-directory, +xpath, +json, +interactive, +yaml, +case-insensitive-filenames

@k-takata
Copy link
Member

This is a long-standing issue on Windows.
We currently use ANSI APIs, but we need to use Unicode APIs to handle Unicode file names.
This is a hard work, though.

@Lennon925
Copy link

do you want to support this or not ?

@masatake
Copy link
Member

I want to support it but I don't know how to do it.

@k-takata
Copy link
Member

Basically, we need to modify everywhere we handle filenames.
For example, we need to use _wmain() instead of main() to get UTF-16 command line, and need to use _wfopen() instead of fopen() to open a file with UTF-16 filename.
It might be better to create a wrapper layer for converting UTF-16 between UTF-8 and always use UTF-8 in the core part of u-ctags.

@Lennon925
Copy link

How long will you take to fix this issue?

@masatake
Copy link
Member

@k-takata, thank you. Now I understand the meaning of "a hard work".

@masatake
Copy link
Member

@Lennon925, I'm sorry but I have no plan to fix this.
We have to find a volunteer for fixing this issue.

@masatake
Copy link
Member

masatake commented Dec 28, 2018

As the first step, we have to add a test cast to Tmain.
Unlink, Units, there is no way to record a test case for a known bug.
Extending tmain test driver must be done first.

@masatake
Copy link
Member

@k-takata, I tried a file having Japanese character as input for ctags on msys-2.
Unexpectedly, it works well. I think I'm doing something wrong. Could you give me more hints?

ctags-jp-filename

@k-takata
Copy link
Member

On Japanese Windows, we can use Japanese characters, however, characters that cannot be represented by Shift_JIS (e.g. alphabets with diacritical mark, simplified Chinese characters, ...) cannot be used on Japanese Windows. Similarly, Japanese characters cannot be used on English Windows.

A workaround is using Cygwin (or MSYS2) version of u-ctags instead of Win32 version. It handles the filenames in UTF-8.

@Lennon925
Copy link

On Japanese Windows, we can use Japanese characters, however, characters that cannot be represented by Shift_JIS (e.g. alphabets with diacritical mark, simplified Chinese characters, ...) cannot be used on Japanese Windows. Similarly, Japanese characters cannot be used on English Windows.

A workaround is using Cygwin (or MSYS2) version of u-ctags instead of Win32 version. It handles the filenames in UTF-8.

Hi k-takata,
this issue is fixed? if not, when will it be finished?

Regards,
Lennon

@k-takata
Copy link
Member

As I already said, it's very difficult to fix, and I don't have a plan to fix it yet.
If you really need it, please use Cygwin version of u-ctags for now.

@k-takata
Copy link
Member

k-takata commented Dec 3, 2019

Starting from Windows 10 1903, UTF-8 code page can be used by specifying application manifest file.
https://docs.microsoft.com/en-us/windows/uwp/design/globalizing/use-utf8-code-page
This might be able to solve the problem.

@masatake
Copy link
Member

masatake commented Dec 3, 2019

@k-takata, I think this is a kind of FAQ. How do you think?

@k-takata
Copy link
Member

k-takata commented Dec 3, 2019

Ah, maybe.

@k-takata
Copy link
Member

k-takata commented Dec 6, 2019

If #2360 is merged, it will be like this?

Q. Does Universal Ctags support Unicode file names?
A. Yes, Unicode file names are supported on unix-like platforms (Linux, macOS, Cygwin, etc.).
However, on Windows, you need to use Windows 10 version 1903 or later to use Unicode file names. (This is an experimental feature, though.)
On older versions on Windows, Universal Ctags only support file names represented in the current code page.
If you still want to use Unicode file names on them, use Cygwin or MSYS2 version of Universal Ctags as a workaround.

@masatake
Copy link
Member

masatake commented Dec 6, 2019

YES! THAK YOU VERY MUCH.

Your comment lets me realize how the ctags-faq.7.rst to be.

About C/C++ parser
===================================================
...

About ctags running on Windows
========================================
Q. Does Universal Ctags support Unicode file names?
A. Yes, Unicode file names are supported on unix-like platforms (Linux, macOS, Cygwin, etc.).
However, on Windows, you need to use Windows 10 version 1903 or later to use Unicode file names. (This is an experimental feature, though.)
On older versions on Windows, Universal Ctags only support file names represented in the current code page.
If you still want to use Unicode file names on them, use Cygwin or MSYS2 version of Universal Ctags as a workaround.

@k-takata
Copy link
Member

k-takata commented Dec 6, 2019

About ctags running on Windows

If the section is for Windows, the first and second sentences of the answer needs to be adjusted.

(edited)
E.g.

A. Partly yes. If you use Windows 10 version 1903 or later, Universal Ctags can use Unicode file names. (This is an experimental feature, though.)

@k-takata
Copy link
Member

k-takata commented Dec 9, 2019

This should be fixed by #2360 (on Windows 10 1903 or later).

@k-takata k-takata closed this as completed Dec 9, 2019
@masatake
Copy link
Member

masatake commented Dec 9, 2019

@k-takata, you wrote:

This is a hard work, though.

However, it seems that you have written the code fixing for this issue a few day :-)
Maybe a correct sentnce is:

This is a hard work for you, though (but not for me).

@k-takata
Copy link
Member

k-takata commented Dec 9, 2019

Actually, Microsoft did a job, not me. ;-)
That's why this fix works only on Win10 1903 or later.

@k-takata
Copy link
Member

k-takata commented Dec 9, 2019

BTW, this fix has a restriction.
If we use Unicode APIs (as I suggested before), we can use 255 UTF-16 characters for file names.
However, with this fix, the maximum length of file names is limited to 255 bytes. (E.g. normal Japanese character is 3 bytes, so it is only 85 characters.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants