Skip to content

Serial monitor character encoding option #1728

Open
arduino/Arduino
#8660
@Ivan-Perez

Description

@Ivan-Perez

Describe the request

It would be great if in the serial monitor had an option to change the encoding used.

In my case, my sketches are using UTF-8, so print messages use that encoding. By default, serial monitor is using ISO-8859 (probably the default one of Windows 7), so those print messages are not shown properly:

serial-monitor-encoding-problems

Additional context

I've found other issues talking about this problem. Instead of detecting the encoding used, it might be easier if the user could select the charset he wants to use. The option (a selectable list with the most used character sets) may be put in the bottom right corner, left of the two existing selects (baud rate and line feed).


Additional requests:

Activity

changed the title [-]Change serial monitor encoding[/-] [+]Serial monitor encoding option[/+] on Jan 19, 2016
changed the title [-]Serial monitor encoding option[/-] [+]Serial monitor character encoding option[/+] on Jan 19, 2016
lmihalkovic

lmihalkovic commented on Jan 20, 2016

@lmihalkovic

Problem identified in 2014 with editor arduino/Arduino#2430, but May 2015 comment contains reference to limitation with serial monitor encoding support

aknrdureegaesr

aknrdureegaesr commented on Feb 5, 2017

@aknrdureegaesr

I did some research on UTF-8 behavior today. I found this code in arduino-core/src/processing/app/Serial.java which does the conversion from the incoming bytes to the strings displayed by the UI:

  public synchronized void serialEvent(SerialPortEvent serialEvent) {
    if (serialEvent.isRXCHAR()) {
      try {
        byte[] buf = port.readBytes(serialEvent.getEventValue());
        if (buf.length > 0) {
          String msg = new String(buf);
          char[] chars = msg.toCharArray();
          message(chars, chars.length);
        }
      } catch (SerialPortException e) {
        errorMessage("serialEvent", e);
      }
    }
  }

The String-constructor used decodes the incoming bytes according to the particular platform's default charset, its documentation says.

I think the standard should be to always use UTF-8. So a sketch developed and programmed on a Mac works with a Linux or Windows box and vice versa. If we later want to add HEX display, as proposed by #1727, here would be one place to do it.

There is a catch here: In our context, there is no guarantee whatsoever that the bytes that come in do us the favor to split neatly at character boundaries. It is quite feasible they include only the first byte of a character that is encoded into several bytes, and the other bytes come later, with the next call.

In this case, the String-constructor is documented to exhibit undocumented behavior :-) .

In actual tests, I indeed see erratic output whenever I use non-ASCII characters in my sketches. Sometimes it works, sometimes it doesn't. (This is Linux, and I'm almost certain the platform encoding is UTF-8.)

The String-constructor documentation advises to use a CharsetDecoder instead.

I think that's good advice. This would give control over the encoding used. Clean UTF-8 decoding even in the split character case is a feature included in CharsetDecoder.

AtosNicoS

AtosNicoS commented on Jul 24, 2018

@AtosNicoS

Hi,
I also recently stumbled across that limitation. The Serial Monitor now supports reading in UTF-8, so writing should also default to UTF-8.

transferred this issue fromarduino/Arduinoon Dec 1, 2022

4 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @aknrdureegaesr@lmihalkovic@Ivan-Perez@per1234@AtosNicoS

    Issue actions

      Serial monitor character encoding option · Issue #1728 · arduino/arduino-ide