Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special Chars Corrupted due to Encoding #1

Open
5 tasks
tajmone opened this issue Oct 25, 2019 · 0 comments
Open
5 tasks

Special Chars Corrupted due to Encoding #1

tajmone opened this issue Oct 25, 2019 · 0 comments
Labels
💀 encoding Problems with characters encoding of source files 💀 invalid This doesn't seem right

Comments

@tajmone
Copy link
Owner

tajmone commented Oct 25, 2019

Some special characters in source files seem to have gone corrupted due to different character encodings.

  • hemisc.c — Fix broken chars (inside comments only)
  • iotest.c — Fix broken chars (inside comments only)
  • textfont.h — (corrupted char constants):
    • Test if code works as expected.
    • Fix them, if needed.

In some cases, the problem only affects comments (i.e. visual representation of the special chars hex values), but in some places this might have affected character constants in assignment expressions — which could break the actual Hugo functions dealing with charset conversions, or wrongly represent some chars.

In some cases, I've managed to fix some of the more obvious characters by comparing the original sources from Hugo SVN repository; but some files look broken there also.

Some of these characters are shown in the editor as hex entities, for they are not valid UTF-8 chars. I've tried switching encoding in the editor, in places where I knew which character to expect, but I couldn't work out which might have been the original encoding used.

Here's a list of affected files (possibly, incomplete):

file lines notes status
hemisc.c 2250–2410 Chars previews in comments. fixed
iotest.c 842–999 Char constants in asignments. ???
textfont.h 111– Chars previews in comments. malformed

In files hemisc.c and textfont.h the problem merely affects the characters previews inside comments:

Example from hemisc.c:

         switch (s)
         {
            case 'a':  s = (char)0xe0; break;   /*   à   */
            case 'e':  s = (char)0xe8; break;   /*   è   */
            case 'i':  s = (char)0xec; break;   /*   ì   */

Example from textfont.h:

static unsigned char text_font[FONT_COUNT][FONT_HEIGHT] =
{
   { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },   /* ' ' */
   { 0x18, 0x3C, 0x3C, 0x18, 0x18, 0x00, 0x18, 0x00 },   /* '!' */
   { 0x6C, 0x6C, 0x6C, 0x00, 0x00, 0x00, 0x00, 0x00 },   /* '"' */
   ...
   { 0x76, 0xDC, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },   /* '~' */
   { 0x00, 0x10, 0x38, 0x6C, 0xC6, 0xC6, 0xFE, 0x00 },   /* '�' */
   { 0x78, 0xCC, 0xC0, 0xCC, 0x78, 0x18, 0x0C, 0x78 },   /* '€' */
   { 0x00, 0xCC, 0x00, 0xCC, 0xCC, 0xCC, 0x7E, 0x00 },   /* '�' */
   { 0x1C, 0x00, 0x78, 0xCC, 0xFC, 0xC0, 0x78, 0x00 },   /* '‚' */

As for iotest.c there, might be real problems due to char constants which might be corrupted by UTF-8 conversion:

         switch (s)
         {
            case 'a':  s = 'ý'; break;
            case 'e':  s = 'Ë'; break;
            case 'i':  s = 'Ï'; break;
            case 'o':  s = 'Ú'; break;
            case 'u':  s = '˜'; break;
            case 'A':  s = '¿'; break;
            case 'E':  s = '»'; break;
            case 'I':  s = 'Ã'; break;
            case 'O':  s = '³'; break;
            case 'U':  s = '�'; break;
         }

These issues need to be addressed.

@tajmone tajmone added 💀 invalid This doesn't seem right 💀 encoding Problems with characters encoding of source files labels Oct 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💀 encoding Problems with characters encoding of source files 💀 invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

1 participant