MIDI device with unicode character returns empty string for port name #115

funfunfunfunfunmike · 2017-06-29T21:37:38Z

I'm on macOS Sierra 10.12.3.

I have a MIDI device plugged in (a DJTechTools Midi Fighter 3d), that in other software such as MIDI Monitor and Audio Midi Setup (the Mac shipped app), has the port name:

Midi Fighter 3D�̨w

Note the unicode char, '̨', http://graphemica.com/0328

When I run midiprobe, however, it is reported as the empty string:

$ ./midiprobe

Compiled APIs:
OS-X CoreMidi

Current input API: OS-X CoreMidi

There are 2 MIDI input sources available.
Input Port #1: IAC Driver Plustype_IAC
Input Port #2:

Current output API: OS-X CoreMidi

There are 2 MIDI output ports available.
Output Port #1: IAC Driver Plustype_IAC
Output Port #2:

I originally reported this as a bug to the creator of the python- library, but he had me compile midiprobe and now see this appears to be happening on the C++ side of things.

garyscavone · 2017-08-21T19:54:01Z

Please submit a pull request, as this would be a hard issue to test.

radarsat1 · 2017-08-25T21:17:29Z

Amazing, especially since the unicode character is at the end of the string I'm surprised this really causes problems, since I would have assumed UTF-8. But looking at the code, the string goes through a few hoops before making its way into std::string. Maybe it's coming in as UTF-16 and encountering a zero in the first charater.

Anyway, my guess is to replace CFStringGetSystemEncoding() with a specific encoding, such as kCFStringEncodingUTF8 or kCFStringEncodingASCII or kCFStringEncodingISOLatin1. I'd have to test it, but probably the problem can be reproduced by playing with CFStringGetCString; I don't think it's specific to CoreMIDI.

radarsat1 · 2017-08-25T21:19:47Z

In terms of string encoding issues, my inclination would be to keep the same semantics everywhere, i.e. always convert to ASCII, and provide a new interface that can return UTF-8, getPortNameUTF8 or something. The other option is to redefine the semantics to return UTF-8 (which is backwards compatible with ASCII), but existing applications may not expect this.

radarsat1 · 2017-08-28T15:14:51Z

Here's the results of an experiment:

#include <CoreMIDI/CoreMIDI.h>
#include <CoreAudio/HostTime.h>
#include <CoreServices/CoreServices.h>

int main()
{
  char str[1024];
  CFStringRef teststr = CFStringCreateWithCString(
    NULL,"Midi Fighter 3D�̨w",kCFStringEncodingUTF8);

  CFStringGetCString(teststr, str, sizeof(str), CFStringGetSystemEncoding());
  printf("CFStringGetSystemEncoding(): \"%s\" (%#x, %#x)\n", str, str[0], str[1\
]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingASCII);
  printf("kCFStringEncodingASCII: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingUTF8);
  printf("kCFStringEncodingUTF8: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingUTF16);
  printf("kCFStringEncodingUTF16: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingUnicode);
  printf("kCFStringEncodingUnicode: \"%s\" (%#x, %#x)\n", str, str[0], str[1]);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingISOLatin1);
  printf("kCFStringEncodingISOLatin1: \"%s\" (%#x, %#x)\n", str, str[0], str[1]\
);
  CFStringGetCString(teststr, str, sizeof(str), kCFStringEncodingNonLossyASCII)\
;
  printf("kCFStringEncodingNonLossyASCII: \"%s\" (%#x, %#x)\n", str, str[0], st\
r[1]);
  CFRelease(teststr);
  return 0;
}

prints:

CFStringGetSystemEncoding(): "" (0, 0x69)
kCFStringEncodingASCII: "" (0, 0x69)
kCFStringEncodingUTF8: "Midi Fighter 3D�̨w" (0x4d, 0x69)
kCFStringEncodingUTF16: "M" (0x4d, 0)
kCFStringEncodingUnicode: "M" (0x4d, 0)
kCFStringEncodingISOLatin1: "" (0, 0x69)
kCFStringEncodingNonLossyASCII: "Midi Fighter 3D\ufffd\u0328w" (0x4d, 0x69)

It's really unfortunately that the ASCII encoding seems to put zero for the first value of the string for some reason. It seems that NonLossyASCII does a good job of making the string displayable but contains encoding escape codes. The UTF8 is obviously the best but getPortName() does not have UTF-8 semantics as far as I know. UTF16 is out of the question. Ideas? Potentially we could return the default encoding and try NonLossyASCII in the case that the first character is zero. I have no idea how "exceptional" a case this is, and i have no idea why the CFString functions are converting it so weirdly. Or we simply return UTF8 and state it in the docs.

radarsat1 · 2017-08-29T19:28:14Z

After poking around the internet a bit and looking at current trends, it seems to me that it's generally considered okay to return UTF-8 in an std::string. For instance pybind11 (the Python binding generator) assumes this is the case.

So I simply changed the code to request UTF-8 and documented that fact. A device name featuring special unicode characters should be quite rare anyways, so I think this is fairly safe, and maybe better to err on the side of not modifying the provided string.

funfunfunfunfunmike mentioned this issue Jun 29, 2017

Port name reported as "None" SpotlightKid/python-rtmidi#21

Closed

radarsat1 closed this as completed in 84ed63a Aug 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIDI device with unicode character returns empty string for port name #115

MIDI device with unicode character returns empty string for port name #115

funfunfunfunfunmike commented Jun 29, 2017

garyscavone commented Aug 21, 2017

radarsat1 commented Aug 25, 2017

radarsat1 commented Aug 25, 2017

radarsat1 commented Aug 28, 2017

radarsat1 commented Aug 29, 2017

MIDI device with unicode character returns empty string for port name #115

MIDI device with unicode character returns empty string for port name #115

Comments

funfunfunfunfunmike commented Jun 29, 2017

garyscavone commented Aug 21, 2017

radarsat1 commented Aug 25, 2017

radarsat1 commented Aug 25, 2017

radarsat1 commented Aug 28, 2017

radarsat1 commented Aug 29, 2017