Special Chars Corrupted due to Encoding #1

tajmone · 2019-10-25T00:46:00Z

Some special characters in source files seem to have gone corrupted due to different character encodings.

hemisc.c — Fix broken chars (inside comments only)
iotest.c — Fix broken chars (inside comments only)
textfont.h — (corrupted char constants):
- Test if code works as expected.
- Fix them, if needed.

In some cases, the problem only affects comments (i.e. visual representation of the special chars hex values), but in some places this might have affected character constants in assignment expressions — which could break the actual Hugo functions dealing with charset conversions, or wrongly represent some chars.

In some cases, I've managed to fix some of the more obvious characters by comparing the original sources from Hugo SVN repository; but some files look broken there also.

Some of these characters are shown in the editor as hex entities, for they are not valid UTF-8 chars. I've tried switching encoding in the editor, in places where I knew which character to expect, but I couldn't work out which might have been the original encoding used.

Here's a list of affected files (possibly, incomplete):

file	lines	notes	status
hemisc.c	2250–2410	Chars previews in comments.	fixed
iotest.c	842–999	Char constants in asignments.	???
textfont.h	111–	Chars previews in comments.	malformed

In files hemisc.c and textfont.h the problem merely affects the characters previews inside comments:

Example from hemisc.c:

         switch (s)
         {
            case 'a':  s = (char)0xe0; break;   /*   à   */
            case 'e':  s = (char)0xe8; break;   /*   è   */
            case 'i':  s = (char)0xec; break;   /*   ì   */

Example from textfont.h:

static unsigned char text_font[FONT_COUNT][FONT_HEIGHT] =
{
   { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },   /* ' ' */
   { 0x18, 0x3C, 0x3C, 0x18, 0x18, 0x00, 0x18, 0x00 },   /* '!' */
   { 0x6C, 0x6C, 0x6C, 0x00, 0x00, 0x00, 0x00, 0x00 },   /* '"' */
   ...
   { 0x76, 0xDC, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },   /* '~' */
   { 0x00, 0x10, 0x38, 0x6C, 0xC6, 0xC6, 0xFE, 0x00 },   /* '�' */
   { 0x78, 0xCC, 0xC0, 0xCC, 0x78, 0x18, 0x0C, 0x78 },   /* '€' */
   { 0x00, 0xCC, 0x00, 0xCC, 0xCC, 0xCC, 0x7E, 0x00 },   /* '�' */
   { 0x1C, 0x00, 0x78, 0xCC, 0xFC, 0xC0, 0x78, 0x00 },   /* '‚' */

As for iotest.c there, might be real problems due to char constants which might be corrupted by UTF-8 conversion:

         switch (s)
         {
            case 'a':  s = 'ý'; break;
            case 'e':  s = 'Ë'; break;
            case 'i':  s = 'Ï'; break;
            case 'o':  s = 'Ú'; break;
            case 'u':  s = '˜'; break;
            case 'A':  s = '¿'; break;
            case 'E':  s = '»'; break;
            case 'I':  s = 'Ã'; break;
            case 'O':  s = '³'; break;
            case 'U':  s = '�'; break;
         }

These issues need to be addressed.

The text was updated successfully, but these errors were encountered:

tajmone added 💀 invalid This doesn't seem right 💀 encoding Problems with characters encoding of source files labels Oct 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special Chars Corrupted due to Encoding #1

Special Chars Corrupted due to Encoding #1

tajmone commented Oct 25, 2019

Special Chars Corrupted due to Encoding #1

Special Chars Corrupted due to Encoding #1

Comments

tajmone commented Oct 25, 2019