New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check UTF encoding interactives #501

Closed
JackMorganNZ opened this Issue Oct 14, 2017 · 0 comments

Comments

1 participant
@JackMorganNZ
Member

JackMorganNZ commented Oct 14, 2017

This issue is user submitted and needs to be validated:

When the letters "abcd" are put into the Unicode length text box, both UTF-8 and UTF-16 are shown to use 32 bits. However, if UTF-16 uses 16 bits to encode a character, this should be 16*4=64 bits. Is there a problem with this or is this intentional?

Also reported by another user:

In the data representation section, the Unicode Encoding Size interactive has a bug where the same number of bits is displayed for UTF-16 as it is for UTF-8, allowing for values such as 8 or 24 bits as the length for a piece of text for UTF-16. This is very misinforming and I hope it is fixed soon so that students do not put this error into their work. Thanks :)

Also reported by another user:

I think there is a problem with the interactive that lets you compare text samples to see how many bits they will use depending on which UTF encoding scheme you use. It appears to give the same results for both UTF-8 and UTF-16. It would be great if you could check this out as students will be keen to use it for their upcoming NCEA external assessments in Computer Science.
http://csfieldguide.org.nz/en/chapters/data-representation.html#comparison-of-text-representations

Also reported by another user:

I have come across what I think is a bug on your the Unicode Encoding Size interactive at the bottom of the Unicode section.

It is always showing the same size for UTF-8 and UTF-16. For example, the letter 'a' is saying UTF-8 size = 8 (correct) and UTF-16 = 8 (which is not possible. It should be 16)
Another example is the character '猫'. It says that both UTF-8 and UTF-16 is equal to 24 whereas it should be UTF-8=24 and UTF-16 = 16

Can you please let me know if this is indeed a bug or if I have missed something.

JackMorganNZ added a commit that referenced this issue Oct 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment