Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer with valid ASCII characters classified as non UTF-8 #6

Open
sovereignstack opened this issue Jan 22, 2017 · 3 comments
Open

Comments

@sovereignstack
Copy link

NULL terminated strings (NULL is part of the buffer) should qualify as UTF-8 but don't.
In fact only a few characters below 0x20 qualify as valid.

The check for ASCII should simply be
if(bytes[i]<=0x7F)

And not

        if(     (// ASCII
                    bytes[i] == 0x09 ||
                    bytes[i] == 0x0A ||
                    bytes[i] == 0x0D ||
                    (0x20 <= bytes[i] && bytes[i] <= 0x7E)
                )

UTF-8 is supposed to be backward compatible with ASCII.

UTF-8 Wiki

The salient features of this scheme are as follows:

Backward compatibility: One-byte codes are used only for the ASCII values 0 through 127. In this case the UTF-8 code has the same value as the ASCII code. The high-order bit of these codes is always 0. This means that ASCII text is valid UTF-8....

The string is sent by a device and is received by a third party software (node-red) which uses is-UTF8. I can't change either.

@ilkkao
Copy link

ilkkao commented May 2, 2017

I'd like to see ASCII control characters to be considered valid UTF-8 as well.

@bkdotcom
Copy link

bkdotcom commented May 23, 2017

looks like this code was based on the regex found at https://www.w3.org/International/questions/qa-forms-utf-8

I opened an issue re the w3.org regex:
w3c/i18n-drafts#90

@ilkkao
Copy link

ilkkao commented May 23, 2017

I btw switched to https://github.com/websockets/utf-8-validate that allows all ASCII characters

Rotzbua added a commit to Rotzbua/node-red that referenced this issue Jun 17, 2024
package already used by `ws`

`is-utf8` is abandoned and has some issues:
wayfind/is-utf8#6
Rotzbua added a commit to Rotzbua/node-red that referenced this issue Jun 17, 2024
package already used by `ws`

`is-utf8` is abandoned and has some issues:
wayfind/is-utf8#6
Rotzbua added a commit to Rotzbua/node-red that referenced this issue Jun 17, 2024
package already used by `ws`

`is-utf8` is abandoned and has some issues:
wayfind/is-utf8#6
Rotzbua added a commit to Rotzbua/node-red that referenced this issue Jun 17, 2024
package is maintained by the already used package `ws`

`is-utf8` is abandoned and has some issues:
wayfind/is-utf8#6
Rotzbua added a commit to Rotzbua/node-red that referenced this issue Jun 17, 2024
package is maintained by the already used package `ws`

`is-utf8` is abandoned and has some issues:
wayfind/is-utf8#6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants