Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIDI device with unicode character returns empty string for port name #115

Closed
funfunfunfunfunmike opened this issue Jun 29, 2017 · 5 comments

Comments

@funfunfunfunfunmike
Copy link

I'm on macOS Sierra 10.12.3.

I have a MIDI device plugged in (a DJTechTools Midi Fighter 3d), that in other software such as MIDI Monitor and Audio Midi Setup (the Mac shipped app), has the port name:

Midi Fighter 3D�̨w

Note the unicode char, '̨', http://graphemica.com/0328

When I run midiprobe, however, it is reported as the empty string:

$ ./midiprobe

Compiled APIs:
OS-X CoreMidi

Current input API: OS-X CoreMidi

There are 2 MIDI input sources available.
Input Port #1: IAC Driver Plustype_IAC
Input Port #2:

Current output API: OS-X CoreMidi

There are 2 MIDI output ports available.
Output Port #1: IAC Driver Plustype_IAC
Output Port #2:

I originally reported this as a bug to the creator of the python- library, but he had me compile midiprobe and now see this appears to be happening on the C++ side of things.

@garyscavone
Copy link
Contributor

Please submit a pull request, as this would be a hard issue to test.

@radarsat1
Copy link
Contributor

Amazing, especially since the unicode character is at the end of the string I'm surprised this really causes problems, since I would have assumed UTF-8. But looking at the code, the string goes through a few hoops before making its way into std::string. Maybe it's coming in as UTF-16 and encountering a zero in the first charater.

Anyway, my guess is to replace CFStringGetSystemEncoding() with a specific encoding, such as kCFStringEncodingUTF8 or kCFStringEncodingASCII or kCFStringEncodingISOLatin1. I'd have to test it, but probably the problem can be reproduced by playing with CFStringGetCString; I don't think it's specific to CoreMIDI.

@radarsat1
Copy link
Contributor

In terms of string encoding issues, my inclination would be to keep the same semantics everywhere, i.e. always convert to ASCII, and provide a new interface that can return UTF-8, getPortNameUTF8 or something. The other option is to redefine the semantics to return UTF-8 (which is backwards compatible with ASCII), but existing applications may not expect this.

@radarsat1
Copy link
Contributor

Here's the results of an experiment:

#include <CoreMIDI/CoreMIDI.h>
#include <CoreAudio/HostTime.h>
#include <CoreServices/CoreServices.h>

int main()
{
  char str[1024];
  CFStringRef teststr = CFStringCreateWithCString(
    NULL,"Midi Fighter 3D�̨w",kCFStringEncodingUTF8);

  CFStringGetCString(teststr, str, sizeof(str), CFStringGetSystemEncoding());
  printf("CFStringGetSystemEncoding(): \"%s\" (%#x, %#x)\n", str, str[0], str[1\
]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingASCII);
  printf("kCFStringEncodingASCII: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingUTF8);
  printf("kCFStringEncodingUTF8: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingUTF16);
  printf("kCFStringEncodingUTF16: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingUnicode);
  printf("kCFStringEncodingUnicode: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingISOLatin1);
  printf("kCFStringEncodingISOLatin1: \"%s\" (%#x, %#x)\n", str, str[0], str[1]\
);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingNonLossyASCII)\
;
  printf("kCFStringEncodingNonLossyASCII: \"%s\" (%#x, %#x)\n", str, str[0], st\
r[1]);
  CFRelease(teststr);
  return 0;
}

prints:

CFStringGetSystemEncoding(): "" (0, 0x69)
kCFStringEncodingASCII: "" (0, 0x69)
kCFStringEncodingUTF8: "Midi Fighter 3D�̨w" (0x4d, 0x69)
kCFStringEncodingUTF16: "M" (0x4d, 0)
kCFStringEncodingUnicode: "M" (0x4d, 0)
kCFStringEncodingISOLatin1: "" (0, 0x69)
kCFStringEncodingNonLossyASCII: "Midi Fighter 3D\ufffd\u0328w" (0x4d, 0x69)

It's really unfortunately that the ASCII encoding seems to put zero for the first value of the string for some reason. It seems that NonLossyASCII does a good job of making the string displayable but contains encoding escape codes. The UTF8 is obviously the best but getPortName() does not have UTF-8 semantics as far as I know. UTF16 is out of the question. Ideas? Potentially we could return the default encoding and try NonLossyASCII in the case that the first character is zero. I have no idea how "exceptional" a case this is, and i have no idea why the CFString functions are converting it so weirdly. Or we simply return UTF8 and state it in the docs.

@radarsat1
Copy link
Contributor

After poking around the internet a bit and looking at current trends, it seems to me that it's generally considered okay to return UTF-8 in an std::string. For instance pybind11 (the Python binding generator) assumes this is the case.

So I simply changed the code to request UTF-8 and documented that fact. A device name featuring special unicode characters should be quite rare anyways, so I think this is fairly safe, and maybe better to err on the side of not modifying the provided string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants