Skip to content

tats-u/WideCharToMultiByteTest

Repository files navigation

WideCharToMultiByte Test

What is this?

Currently, the Windows API function WideCharToMultiByte behaves inconveniently when you try to convert an UTF-16 wide string to Shift_JIS. It cannot convert Unicode strings generated by other OSes with the Japanese locale to Shift_JIS.

For example, WAVE DASH (U+301C) is widely used among other OSes but is not converted to Shift_JIS in Windows. Due to this problem, we are forced to use the abnormal FULLWIDTH TILDE (U+FF5E) instead of the WAVE DASH via IMEs (Input Method Editors) in Windows. It means Windows lacks the interoperability with other OSes.

This repository contains the test code to show Windows API function WideCharToMultiByte, used to convert Unicode strings to Shift_JIS, behaves incorrectly.

Tracking issues

What characters must be converted properly?

Shift_JIS Other OSes (correct) Windows (incorrect)
0x81 0x60 U+301C WAVE DASH 〜 U+FF5E FULLWIDTH TILDE ~
0x81 0x61 U+2016 DOUBLE VERTICAL LINE ‖ U+2225 PARALLEL TO ∥
0x81 0x7C U+2212 MINUS SIGN − U+FF0D FULLWIDTH HYPHEN-MINUS -
0x81 0x5C U+2014 EM DASH — U+2015 HORIZONTAL BAR ―

Windows must convert the characters in the middle column to Shift_JIS without the WC_NO_BEST_FIT_CHARS option.

Also, the OVERLINE (U+203E ) is assigned to 0x7E in JIS X 0201. Therefore, U+203E must be converted to Shift_JIS 0x7E without the WC_NO_BEST_FIT_CHARS option, too.

How to use

  1. Open the solution file
  2. Build
  3. Run unit tests (only those in the namespace “MustBeFixed” will intentionally fail)

For developers of Windows itself: if you fix the problem, you can run the unit tests again and all will pass.

Functions used to test

Loose conversion (without the WC_NO_BEST_FIT_CHARS option; the above characters are tested using this function):

static std::optional<std::string> try_convert_to_sjis_loosely(const wchar_t input) {
    BOOL failed = false;
    int len = WideCharToMultiByte(932, 0, &input, 1, nullptr, 0, nullptr, &failed);
    assert(GetLastError() != ERROR_INVALID_PARAMETER);
    if (failed) {
        return std::nullopt;
    }
    std::string output(len, 0);
    WideCharToMultiByte(932, 0, &input, 1, output.data(), len, nullptr, nullptr);
    return output;
}

This function returns the corresponding multibyte string encoded in Shift_JIS if it succeeds, and std::nullopt if it fails. All of the above characters are not converted to Shift_JIS by this function currently, but must be converted.

FYI, Strict conversion (with WC_NO_BEST_FIT_CHARS option):

static std::optional<std::string> try_convert_to_sjis_strictly(const wchar_t input) {
    BOOL failed = false;
    int len = WideCharToMultiByte(932, WC_NO_BEST_FIT_CHARS, &input, 1, nullptr, 0, nullptr, &failed);
    assert(GetLastError() != ERROR_INVALID_PARAMETER);
    if (failed) {
        return std::nullopt;
    }
    std::string output(len, 0);
    WideCharToMultiByte(932, WC_NO_BEST_FIT_CHARS, &input, 1, output.data(), len, nullptr, nullptr);
    return output;
}

The above characters do not have to be converted to Shift_JIS if WC_NO_BEST_FIT_CHARS is passed.

Used library

Visual C++ GoogleTest

About

Unicode-to-Shift_JIS conversion in Windows is broken!

Topics

Resources

License

Stars

Watchers

Forks

Languages