Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adhoc fix where waf cannot run under Japanese version of Windows. #2155

Merged
merged 9 commits into from May 8, 2018

Conversation

Projects
None yet
2 participants
@Suzumizaki
Copy link
Contributor

commented Apr 23, 2018

I don't know why but 'txt' here has bad decoded string.
My environments are:

  • Windows 10 Pro Japanese Edition
  • Python 3.6.5, 64 bit version

And the 'txt' seems containing non-ASCII charachers. Shift-JIS or codepage 932(cp932) is DBCS, the trailing byte of the double byte character can be '\'. When bad decoded, '\'s are often splitted alone, and JSON decoder try to read as \escaped character and fails. That's the reason this adhoc fix required.
See 'mojibake' on Wikipedia.

In fact, Somewhere in the waf before reaching here is logically wrong about decoding/encoding. But I cannot investigate now.

@ita1024

This comment has been minimized.

Copy link
Member

commented Apr 24, 2018

Can you print the value of sys.stdout.encoding please?

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2018

Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> import locale
>>> locale.getpreferredencoding(False)
'cp932'
>>> locale.getpreferredencoding(True)
'cp932'

Got from PyCharm console, but almost same result from cmd.exe('utf-8' in lower case).

I doubt in somewhere we forgot to apply encoding='utf-8' option when using built-in open function with text-mode( 'rt' or 'wt'). We can not rely on 'utf-8' as default, as written in official document of open function.

@ita1024

This comment has been minimized.

Copy link
Member

commented Apr 24, 2018

There is no open function involved. Can you try setting your console to utf-8 mode, for example chcp 65001?

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2018

I tried chcp 65001, and waf executing looks fine, but all warning/error messages changed to use English. MSVC is fully localized application, and I want to see that compiler messages in my own language.

With/Without chcp 65001, both sys.stdout.encoding and locale.getpreferredencoding() are not changed.

@ita1024

This comment has been minimized.

Copy link
Member

commented Apr 24, 2018

This is progress. What happens if you set sys.stdout.encoding to the desired encoding value in the configuration section of your wscript file?

I hope you understand that this pull request cannot be applied as-is.

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2018

OK, I will try later you said... But I found what's wrong.

C:\Program Files (x86)\Microsoft Visual Studio\Installer\vswhere.exe" -legacy -products * -format json seems always return the string encoded cp932 (in my environment), no matter what we give chcp anything before. -format xml can't resolve the problem, too. Both of them don't have any encoding information.

That means, I think, something ad-hoc work is needed anyway.

@ita1024

This comment has been minimized.

Copy link
Member

commented Apr 25, 2018

We cannot apply such a workaround in this project.

If you are running the latest version of visual studio, have you considered reporting the vswhere.exe behavior to the vendor? https://github.com/Microsoft/vswhere/issues

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2018

Thanks to guide, I posted issue just now. I didn,t know the vswhere is open-source project, Sorry.

This issue can be closed for now. If they reject my proposal on that issue, please help me again.

Additionally, about current behavior of vswhere, we possibly should note nealy top of, or FAQ of, the waf-project. Because the developers who uses waf may not know about this problem, and when claimed from DBCS code page users, they might know how to.

Anyway, thanks a lot. cheers!

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2018

The current summary:

  1. The current behavior is consistent one not only vswhere.exe but throughout MS build tools.
  2. But they, the developer of vswhere, will add -utf8 option or other solution.

Suzumizaki added some commits Apr 23, 2018

Fix: Use the code page to decode after using subprocess.PIPE.
Required to run under non-English versions (like Japanese one) of Windows.
Fix: to enable building under non English version of Windows.
(Adhoc fix on msvc.py is reverted. And Context.py is fixed.)
Fix: to enable building under non English version of Windows.
(Adhoc fix on msvc.py is reverted. And Context.py is fixed.)
Fix: to enable building under non English version of Windows.
(Adhoc fix on msvc.py is reverted. And Context.py is fixed.)
@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2018

Anyway, how about this one, updated pull-request above?
This is logically more correct, almost no overhead.

@ita1024

This comment has been minimized.

Copy link
Member

commented May 5, 2018

While a special case should be made for vswhere.exe in msvc.py, many other applications (msys, ...) output utf-8 data by default (sys.stdout.encoding) so the changes that you are doing to Context.py are likely to cause even more problems.

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2018

Hmmmm, Okay, I'll retry.

@ita1024

This comment has been minimized.

Copy link
Member

commented May 5, 2018

Earlier in the thread it seemed that the data was not encoded properly. Can you confirm that the data output by vswhere is properly encoded with the value returned by locale.getpreferredencoding(False)? What is that value? Can you attach a data fragment that can be used as a testcase? (remove personal data if present)

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 6, 2018

locale.getpreferredencoding(False) returns cp932, but a few hours ago, I found it always returns same value even under chcp 65001 environment.

I also found the way to get the code page value which reflects chcp setting from python code:

import ctypes
code_page_integer = ctypes.windll.kernel32.GetConsoleOutputCP()

This code returns 932 to code_page_integer under chcp 932. And also 65001 under chcp 65001. (But run from "IDLE" python shell, it returns zero.) I found similar result 1251 also under chcp 1251.

And here is output of the vswhere.
vswhere.txt

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2018

Github shows some confusing messages, but a6d8d39 is my latest pull-request/commit. Check again if you need

@ita1024

This comment has been minimized.

Copy link
Member

commented May 7, 2018

To change the messages, try git fetch && git rebase. Also, what happens when running PYTHONIOENCODING=cp932 python3 waf configure clean build?

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2018

It looks no changes, but because I'm using Python 3.6.5.
Instead when I do set PYTHONLEGACYWINDOWSSTDIO=anystring, it seems succeeded.

@ita1024

This comment has been minimized.

Copy link
Member

commented May 8, 2018

The problem seems to be between the new Python 3 defaults and vswhere.exe. Unfortunately ctypes is not always available, so we will need to rework these changes.

@ita1024 ita1024 merged commit c2980e5 into waf-project:master May 8, 2018

@ita1024

This comment has been minimized.

Copy link
Member

commented May 8, 2018

See 7bc3f78

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2018

It goes fine without stops! Thank you!

I will raise one remained problem as another issue later. That's the messages from cl.exe etc. is not readable under my environment(Japanese version of Windows).

@ita1024

This comment has been minimized.

Copy link
Member

commented May 9, 2018

I am afraid that Python 3 only supports Unicode. You may really want chcp 65001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.