Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using strlen to detect the number of bytes breaks the usage of char arrays #165

Closed
S-Dafarra opened this issue Mar 22, 2021 · 6 comments
Closed
Labels

Comments

@S-Dafarra
Copy link

In 67a922f the following two lines have been added

matio/src/mat.c

Lines 1116 to 1118 in 67a922f

} else if ( matvar->class_type == MAT_C_CHAR && matvar->data_type == MAT_T_UTF8 ) {
matvar->nbytes = strlen((const char *)data);
} else {

The use of strlen needs the data to have a \0 char at the end. This is ok when the input data is a C string, but it causes segfaults when the input type is a char array.

@S-Dafarra
Copy link
Author

It also causes segfaults in case the input data pointer is nullptr. This is acceptable for every other data type, as long there is a zero dimension. With MAT_C_CHAR and MAT_T_UTF8 instead, it will try to access data anyway.

@tbeu tbeu added the bug label Mar 22, 2021
@tbeu
Copy link
Owner

tbeu commented Mar 22, 2021

The use of strlen needs the data to have a \0 char at the end. This is ok when the input data is a C string, but it causes segfaults when the input type is a char array.

That's why I set the trailing zero also for char array here:

matio/test/test_mat.c

Lines 1302 to 1303 in 84c8f3d

const mat_uint8_t str[] = {216, 168, 196, 145, 216, 172, 105, 217,
132, 225, 187, 135, 219, 140, 110, 0};

Do you see any other option how to get the string length (which can be different from actual dimension)?

It also causes segfaults in case the input data pointer is nullptr.

Sure, this needs to be avoided.

tbeu added a commit that referenced this issue Mar 22, 2021
@tbeu
Copy link
Owner

tbeu commented Mar 22, 2021

It also causes segfaults in case the input data pointer is nullptr.

Addressed by fa770b2.

@S-Dafarra
Copy link
Author

S-Dafarra commented Mar 23, 2021

Addressed by fa770b2.

Thank you for the quick fix!

Do you see any other option how to get the string length (which can be different from actual dimension)?

I have been thinking about it a bit. If I understood correctly the issue reported in #164, the problem is related to strings (and also arrays) containing UTF8 chars. Since the UTF8 chars require additional bytes to be stored, it is not true that the number of bytes is equal to the size of the string. In this case, the use of strlen fixed the issue since it counts the number of bytes correctly.

Ultimately, the problem seems to be on the correct counting of bytes then, both in the case of strings and arrays, but strlen requires the \0 to be present.

From #164 it also seems that the dimensions need to be set to the actual number of chars, not to the number of bytes. So, both in the case of strings and arrays, the user is supposed to provide the number of elements he/she wants to store.

I wonder if information this can be exploited in this case, i.e. mimicking how strlen works. In practice, it would be necessary to iterate over the number of elements, detect if the element is a plain char or a special UTF character, and count the bytes accordingly.

I was thinking maybe at something similar to https://stackoverflow.com/a/1031773, but with a different while loop condition (since while(*bytes) would require again the terminal \0) 🤔

tbeu added a commit that referenced this issue Mar 23, 2021
@tbeu
Copy link
Owner

tbeu commented Mar 23, 2021

Fixed by 4e6cc2e. I guess I need to a release a new version soon.

@tbeu tbeu closed this as completed Mar 23, 2021
@S-Dafarra
Copy link
Author

Awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants