Currently, the Windows API function WideCharToMultiByte
behaves inconveniently when you try to convert an UTF-16 wide string to Shift_JIS.
It cannot convert Unicode strings generated by other OSes with the Japanese locale to Shift_JIS.
For example, WAVE DASH 〜
(U+301C) is widely used among other OSes but is not converted to Shift_JIS in Windows.
Due to this problem, we are forced to use the abnormal FULLWIDTH TILDE ~
(U+FF5E) instead of the WAVE DASH via IMEs (Input Method Editors) in Windows.
It means Windows lacks the interoperability with other OSes.
- https://ja.wikipedia.org/wiki/%E6%B3%A2%E3%83%80%E3%83%83%E3%82%B7%E3%83%A5#Windows%E3%81%AB%E3%81%8A%E3%81%84%E3%81%A6%E8%B5%B7%E3%81%8D%E3%82%8B%E5%95%8F%E9%A1%8C (Japanese)
- https://ja.wikipedia.org/wiki/Google_%E6%97%A5%E6%9C%AC%E8%AA%9E%E5%85%A5%E5%8A%9B#%E4%BB%95%E6%A7%98 (Japanese)
- https://www.tohoho-web.com/ex/dash-tilde.html (Japanese)
- https://x0213.org/wiki/wiki.cgi?page=%C7%C8%A5%C0%A5%C3%A5%B7%A5%E5%CC%E4%C2%EA (Japanese)
- https://github.com/google/mozc/blob/master/src/base/text_normalizer.cc#L57-L58
- https://ja.wikipedia.org/wiki/%E5%B9%B3%E8%A1%8C%E8%A8%98%E5%8F%B7#%E6%96%87%E5%AD%97%E3%82%B3%E3%83%BC%E3%83%89%E3%81%AB%E3%81%8A%E3%81%91%E3%82%8B%E5%95%8F%E9%A1%8C (Japanese)
This repository contains the test code to show Windows API function WideCharToMultiByte
, used to convert Unicode strings to Shift_JIS, behaves incorrectly.
- https://aka.ms/AAqhm6u
- https://developercommunity.visualstudio.com/t/WideCharToMultiByte932-0--Unic/10638056
Shift_JIS | Other OSes (correct) | Windows (incorrect) |
---|---|---|
0x81 0x60 |
U+301C WAVE DASH 〜 | U+FF5E FULLWIDTH TILDE ~ |
0x81 0x61 |
U+2016 DOUBLE VERTICAL LINE ‖ | U+2225 PARALLEL TO ∥ |
0x81 0x7C |
U+2212 MINUS SIGN − | U+FF0D FULLWIDTH HYPHEN-MINUS - |
0x81 0x5C |
U+2014 EM DASH — | U+2015 HORIZONTAL BAR ― |
Windows must convert the characters in the middle column to Shift_JIS without the WC_NO_BEST_FIT_CHARS
option.
Also, the OVERLINE (U+203E ‾
) is assigned to 0x7E
in JIS X 0201.
Therefore, U+203E must be converted to Shift_JIS 0x7E
without the WC_NO_BEST_FIT_CHARS
option, too.
- Open the solution file
- Build
- Run unit tests (only those in the namespace “MustBeFixed” will intentionally fail)
For developers of Windows itself: if you fix the problem, you can run the unit tests again and all will pass.
Loose conversion (without the WC_NO_BEST_FIT_CHARS
option; the above characters are tested using this function):
static std::optional<std::string> try_convert_to_sjis_loosely(const wchar_t input) {
BOOL failed = false;
int len = WideCharToMultiByte(932, 0, &input, 1, nullptr, 0, nullptr, &failed);
assert(GetLastError() != ERROR_INVALID_PARAMETER);
if (failed) {
return std::nullopt;
}
std::string output(len, 0);
WideCharToMultiByte(932, 0, &input, 1, output.data(), len, nullptr, nullptr);
return output;
}
This function returns the corresponding multibyte string encoded in Shift_JIS if it succeeds, and std::nullopt
if it fails.
All of the above characters are not converted to Shift_JIS by this function currently, but must be converted.
FYI, Strict conversion (with WC_NO_BEST_FIT_CHARS
option):
static std::optional<std::string> try_convert_to_sjis_strictly(const wchar_t input) {
BOOL failed = false;
int len = WideCharToMultiByte(932, WC_NO_BEST_FIT_CHARS, &input, 1, nullptr, 0, nullptr, &failed);
assert(GetLastError() != ERROR_INVALID_PARAMETER);
if (failed) {
return std::nullopt;
}
std::string output(len, 0);
WideCharToMultiByte(932, WC_NO_BEST_FIT_CHARS, &input, 1, output.data(), len, nullptr, nullptr);
return output;
}
The above characters do not have to be converted to Shift_JIS if WC_NO_BEST_FIT_CHARS
is passed.
Visual C++ GoogleTest