Add support to interpret/display packet content as UTF-8 #1190

cofleury · 2024-05-27T16:47:26Z

It would be great to support displaying the content of a packet as UTF-8 in addition to ASCII.

guyharris · 2024-05-27T19:12:04Z

For ASCII (and other single-byte character encodings), there can be a one-to-one correspondence between offsets into the packet and positions in the display.

For multi-byte character encodings, a decision has to be made as to how to display a character that's split between rows in the text display. The best thing to do is probably to display it at the location of the first byte, and perhaps to display the next character, which does not begin at the beginning of the next row, with some filler characters before it, corresponding to the bytes in that row that are part of the character that begins in the previous row.

For variable-length multi-byte character encodings, such as UTF-8, there's not likely to be a correspondence between offsets in the packet and positions in the display. At best, what could be done is to display characters adjacent to one another, display characters that are split across rows at the location of the first byte, and show the aforementioned filler characters.

guyharris · 2024-05-27T19:14:11Z

Sequences of bytes that are valid UTF-8 characters but that are not printable characters should be displayed as ".", just as bytes that are not printable ASCII characters are displayed in the ASCII display.

Any sequence of bytes that are not part of a valid UTF-8 character should probably also be displayed as a sequence of "."s.

infrastation · 2024-06-18T10:59:06Z

What would be the way to know where UTF-8 strings start and end in the packet data? UTF-8 bytes, whether perfectly valid or not, could be prepended/followed by pure binary bytes that could interfere with UTF-8 reading. As far as I understand, the only way to do it reliably would be to know the packet structure when doing a hex dump.

fenner · 2024-07-20T22:06:41Z

There's a straightforward way to identify whether or not a sequence of bytes is valid UTF-8; https://www.cl.cam.ac.uk/~mgk25/ucs/utf8_check.c is an example.

cofleury changed the title ~~Add support to interpret/display packet content as UTF-8 in addition to ASCII~~ Add support to interpret/display packet content as UTF-8 May 27, 2024

guyharris added the improvement label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to interpret/display packet content as UTF-8 #1190

Add support to interpret/display packet content as UTF-8 #1190

cofleury commented May 27, 2024

guyharris commented May 27, 2024

guyharris commented May 27, 2024

infrastation commented Jun 18, 2024

fenner commented Jul 20, 2024

Add support to interpret/display packet content as UTF-8 #1190

Add support to interpret/display packet content as UTF-8 #1190

Comments

cofleury commented May 27, 2024

guyharris commented May 27, 2024

guyharris commented May 27, 2024

infrastation commented Jun 18, 2024

fenner commented Jul 20, 2024