Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid ANSI version Windows string functions #168

Closed
zufuliu opened this issue Mar 18, 2020 · 3 comments
Closed

Avoid ANSI version Windows string functions #168

zufuliu opened this issue Mar 18, 2020 · 3 comments
Labels
Milestone

Comments

@zufuliu
Copy link
Owner

zufuliu commented Mar 18, 2020

These functions are really designed for Windows ANSI code pages, may not works for UTF-8 encoded text.
Example for StrTrimA() failure on my system:

const char *str = "\xe3\x81\x8c\xe7\x97\x9b\xe3\x81\x84\n";
char buf[100];
strcpy(buf, str);
size_t len1 = strlen(buf);
StrTrimA(buf, "\r\n");
size_t len2 = strlen(buf);
assert(len2 < len1);

where the UTF-8 encoded str is Japanese text が痛い from issue #127.

it seem StrTrimA fails when parameter contains invalid byte sequences in current ANSI code page.
This failure will cause sorting lines to add extra new lines to finally result, like sorting folowing lines on my systems (also comes from issue #127):

が痛い
が咲く

There are other places where StrTrimA or other ANSI string function is used, and may contains other bugs.

@zufuliu
Copy link
Owner Author

zufuliu commented Mar 19, 2020

usages for ANSI version Windows string functions can be found with grep -nrE "\w+A\(".
currently the following ANSI functions are used (StrDupA() is code page independent, so excluded from the list):

@zufuliu
Copy link
Owner Author

zufuliu commented Mar 20, 2020

A similar bug is using CRT string function (like strchr()) to process Chinese, Japanese and Korean DBCS encoding text (the tail byte is in range [0x31, 0xFE]):

>>> 'ァ'.encode('cp932')
b'\x83@'

@zufuliu
Copy link
Owner Author

zufuliu commented May 13, 2021

This remaining StrTrimA() and StrStrIA() are either used for ASCII strings, or failure are accepted,

@zufuliu zufuliu closed this as completed May 13, 2021
RaiKoHoff pushed a commit to RaiKoHoff/notepad2 that referenced this issue Jul 19, 2021
RaiKoHoff pushed a commit to RaiKoHoff/notepad2 that referenced this issue Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant