Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: to show localized MSVC messages correctly. #2163

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
2 participants
@Suzumizaki
Copy link
Contributor

commented May 10, 2018

Otherwise, we cannot see the "localized" error/warning messages as human readable, which comes from cl.exe, link.exe, etc. under non-English versions of Microsoft Windows.
(For example, the commiter uses Japanese version of Windows 10. And we should know the messages from the cl.exe or link.exe are almost fully localized.)

Fix: to show localized MSVC messages correctly.
Otherwise, we cannot see the "localized" error/warning messages as human readable, which comes from cl.exe, link.exe, etc. under non-English versions of Microsoft Windows.
(For example, the commiter uses Japanese version of Windows 10. And we should know the messages from the cl.exe or link.exe are almost fully localized.)
@ita1024

This comment has been minimized.

Copy link
Member

commented May 11, 2018

Does this change the way the text appears in the config.log file?

@ita1024

This comment has been minimized.

Copy link
Member

commented May 11, 2018

This will cause problems:

  1. The change will apply to all applications run through the console and not just cl.exe or link.exe
  2. Console settings can change at any point, therefore storing console settings as configuration data is nonsense.
  3. if hasattr(self.env, 'encoding_of_STDOUT') is not necessarily present due to object nesting. Just verify if the value self.env.encoding_of_STDOUT is empty.
@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2018

Does this change the way the text appears in the config.log file?

I ran waf in the demos/c++ folder, and got test.log. These are different with each other.
Or, anything else do you mean?

test(not_pached).log
test(patched).log

@ita1024

This comment has been minimized.

Copy link
Member

commented May 11, 2018

Run waf configure on demos/c and open build/config.log. Are the program outputs correctly formatted when configuration errors occur?

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2018

Here's the results.

The following lines are changed to readable strings after patched.
But the result of cl.exe /help is still unreadable.

out: ライブラリ test.lib とオブジェクト test.exp を作成中

out: LINK : fatal error LNK1181: 入力ファイル 'm.lib' を開けません。

d:\0手動作成\wafgithub\demos\c\build\conf_check_69bc1896fe4b32b3103a9954b82b3583\test.cpp(1): fatal error C1083: include ファイルを開けません。'xyztabcd.h':No such file or directory

d:\0手動作成\wafgithub\demos\c\build\conf_check_27f4238a702602011857e9c10e26181d\test.cpp(1): fatal error C1083: include ファイルを開けません。'unistd.h':No such file or directory

config(patched).log
config(not_patched).log

@ita1024

This comment has been minimized.

Copy link
Member

commented May 11, 2018

The config.log results also mean that the change suggested is ineffective.

Fix: to show localized MSVC messages correctly.
Otherwise, we cannot see the "localized" error/warning messages as human readable, which comes from cl.exe, link.exe, etc. under non-English versions of Microsoft Windows.
(For example, the commiter uses Japanese version of Windows 10. And we should know the messages from the cl.exe or link.exe are almost fully localized.)
@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2018

Then, in this case, please tell me what is the ideal result.
After I updated pull-request (just now), the results of both cl.exe /help and cl.exe /nologo /help are readable too.

config(re_patched).log

@ita1024

This comment has been minimized.

Copy link
Member

commented May 11, 2018

The previous points 1, 2 and 3 from above still apply. The console encoding can change between configuration and build so the changes are incomplete (A), and then the settings are applied to all tasks (not just cl.exe and link.exe), so the changes are also very broad (B).

If you really mean to do (B), then this goes back to assuming that all programs are using the console codepage settings. Are you completely certain that this assumption is correct? If yes, then the default value for decode_as should be taken from cmd_encoding (and that function should be moved to the Utils module).

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2018

Thanks, I passed the first check. :)

Let's talk about (A), and I have a question. Should we support a situation like below?:

  1. run python ./waf configure.
  2. modify some of the environment variables, like INCLUDE, PATH, LIBS, PYTHONxxx or WAFLIB.
  3. run python ./waf build, without re-running configure.

If the answer is "No", what's the difference between modifying environment variables and worrying about whether the code page is changed or not? I think both of them are user matters, in other words there's nothing different, we don't have to be afraid about that.

If the answer is "Yes", ...I should ask why c4che\build.config.py has the extracted values of the environment variables, but I think that is nonsense, and because of that, I think the answer cannot be "Yes".

How do you think about that?

@ita1024

This comment has been minimized.

Copy link
Member

commented May 13, 2018

An important difference is that the code page seems to be an attribute of the current interpreter (you mentioned the "IDLE" python shell), but that is not the point. If the codepage is going to apply to all applications executed, then why store it at all?

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2018

Do you mean we should re-detect the encoding at every time to call cl.exe, link.exe, winres.exe etc. on every time to run waf configure and waf build?

That's acceptable, but somewhat difficult for me to implement. Can you guide me where on the codes I should fix?

@ita1024

This comment has been minimized.

Copy link
Member

commented May 15, 2018

Not so fast, what I am asking you is to clarify your assumptions:

Once again, there is a discrepancy in the changes suggested: code page settings would be used by all tasks besides cl.exe, link.exe but not in other build phases such as the configuration.

Your assumption is that all programs are using the same console codepage settings from the windows console when present. Are you certain that this assumption is correct?

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2018

Your assumption is that all programs are using the same console codepage settings from the windows console when present. Are you certain that this assumption is correct?

Which do you mean:

  1. My codes will decode the standard outputs using the console code page, even if the outputs come from non-MSVC related tools. --- In this case, that's NOT I want to. That's my failure.
  2. My codes will decode the outputs using the same code page, but for the outputs from all MSVC related tools ONLY. --- In this case, I want to do that. I cannot imagine the needs to switch code pages each call.

And you are pointing another problem, right? -- If my codes only care about waf build but NOT waf configure... that is (also) my failure.

@ita1024

This comment has been minimized.

Copy link
Member

commented May 16, 2018

Yes, so I will try to dig the question, but I need some more information. In the console, is the outputs of cmd_and_log('dir.exe') properly decoded? (in other words, what is the encoding of the outputs of common .exe utilities?).

@ita1024

This comment has been minimized.

Copy link
Member

commented May 18, 2018

Could you please answer the questions above?

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 18, 2018

Sorry, I'm busy a several days. The thing I can say now is, the shell commands, dir cd or path etc., also refer the code page of the console. The help messages of them (by option /?) is all localized.

@ita1024

This comment has been minimized.

Copy link
Member

commented May 19, 2018

The reason for the problem comes entirely from Python 3.6:

C:\python35\python.exe -c "import sys;print(sys.stdout.encoding, sys.stdout.isatty())"
cp437 True
C:\python36\python.exe -c "import sys;print(sys.stdout.encoding, sys.stdout.isatty())"
utf-8 True

This means that the default decoding value should be changed for Python >= 3.6 (the assumption above is actually true).

ita1024 added a commit that referenced this pull request May 19, 2018

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 19, 2018

It looks work fine, but this solution(6873a1b) can be blamed later.

There's no rule to force the any developers of any executables to use limited one of encoding.

We cannot force them to use UTF-8,
we cannot force them to use the encoding of the console,
and also we cannot force them to use the encoding we suggest.

We can only ask "What the encoding of the output that your tool emits?"

Please consider why sys.stdout.encoding WAS one of local encodings, and now Python 3.6, it IS utf-8.

  1. Old days, third party tools and 'end-developer's' tools "probably" used local encoding to resolve their local issues.
  2. Today, many third party tools and end-developers' ones "probably" uses utf-8 to be used world-wide, global purpose.

In fact, PEP 529 talks about the path problem only, but I believe it includes the real backgrounds shown above. Anyway, sys.stdout.encoding is the convenient suggestion just for end-developers. But the global tools, like our waf-project, cannot rely on that. Because some tools possibly emits utf-8 strings, but on the other hands, the other tools possibly emits local encoded string.

Even using ctypes.windll.kernel32.GetConsoleCP() ... why don't you use GetConsoleOutputCP()? ... the fact shown above doesn't change. We just know MSVC tools and shell commands uses the encoding GetConsoleOutputCP() shows. But we cannot know which encoding are used by the other tools, until we check them or the end-developer gives us as parameters.

Again, this solution "looks" solved my problem, but does not logically. If you want to continue, I will help you as possible as I can.

@ita1024

This comment has been minimized.

Copy link
Member

commented May 19, 2018

  1. The documentation for Python 3.6.5 is misleading or obsolete because the encoding is always set to utf-8 in Python 3.6: https://docs.python.org/3/library/sys.html#sys.stderr
  2. The change made to Python 3.6 has little to do with how data is actually communicated between applications. It is rather a way of ensuring that text is properly output in the windows console https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep528
  3. The change applied ensures that waf behaves in the same manner under Python 3.6 as under previous Python versions.
  4. GetConsoleCP is supposed to give the encoding for stdin, and this is probably best to read the outputs of other programs (exec_command/cmd_and_log). Do you believe that GetConsoleOutputCP would give distinct results?
@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 22, 2018

1, and 2. We should not apply the manner(, which you said at 3.,) to subprocess functions and classes. The manner only targets when one uses sys.stdin/stdout/stderr directly as end-developers, not includes subprocess call.

3, Of course! But please exclude the bug termination before the users work done.

4, I think we are not the console itself. We get the encoded text by calling out, err = (subprocess.Popen object).communicate(). out and err mean STDOUT and STDERR. Please see Utils.run_regular_process.

@ita1024 ita1024 closed this May 22, 2018

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 23, 2018

I'm very sad about closing this issue without your new comment :(

@ita1024

This comment has been minimized.

Copy link
Member

commented May 25, 2018

Are there any questions left? Please reformulate if you believe that this is the case. I am unsure on how to interpret the last points 3 and 4.

@Suzumizaki

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2018

Thank you for your message again.

For 4, please try attached script(call_writeconsole.py included) from the console on Windows machine.. Oops, only if you have the time to make the code more logically. My issue seems resolved already.
call_writeconsole.zip

The result under my environment is following:

Codepage of STDIN=932, STDOUT=65001
W(Unicode):日本語の文字列
A(cp  932):{̕
A(cp65001):日本語の文字列

Codepage of STDIN=65001, STDOUT=932
W(Unicode):日本語の文字列
A(cp  932):日本語の文字列
A(cp65001):譌・譛ャ隱槭・譁・ュ怜・

Consider this call_writeconsole.py as external executable against waf-project. You said as if the output of call_writeconsole.py is connected to the STDIN of waf-project, but in fact, the given parameter to subprocess.xxx is stdout=subprocess.PIPE, not stdin=subprocess.PIPE.

As shown above, STDOUT links GetConsoleOutputCP(), not GetConsoleCP().

We may also say GetConsoleOutputCP() has 2 roles.

  1. get the codepage we should use to decode the strings which come from subprocess.
  2. get the codepage we should use to encode to write strings to STDOUT.

Anyway, I understand now you agree to 1 and 2, and not to 3 and 4, currently. Thanks a lot, I'll not blame to finish this discussion with or without your new comments :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.