Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Undefined behavior: NULL + 0 #36

Closed
QrczakMK opened this issue Feb 19, 2023 · 2 comments
Closed

[Bug]: Undefined behavior: NULL + 0 #36

QrczakMK opened this issue Feb 19, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@QrczakMK
Copy link

QrczakMK commented Feb 19, 2023

Describe the bug

When given empty buffers with next_in == NULL and avail_in == 0 (or same for out), liblzma sometimes computes NULL + 0, which has undefined behavior in C (it is defined in C++).

This gets detected by ubsan.

The following places are affected (possibly more):

strm->next_in += in_pos;
strm->avail_in -= in_pos;
strm->total_in += in_pos;
strm->next_out += out_pos;
strm->avail_out -= out_pos;
strm->total_out += out_pos;

The first 3 lines can be guarded with if (in_pos != 0), the last 3 lines can be guarded with if (out_pos != 0).

in + in_start, in_used);

Write in_start != 0 ? in + in_start : in.

Version

5.4.1

Operating System

Linux

Relevant log output

No response

@QrczakMK QrczakMK added the bug Something isn't working label Feb 19, 2023
@Larhzu
Copy link
Member

Larhzu commented Feb 20, 2023

I have read about this a bit now. Sounds like it probably needs to be fixed. Quite a few functions have to be reviewed to spot all such cases as there definitely are more than those you already found. XZ Embedded needs to be reviewed too.

At least with a trivial test program, the method in in_start != 0 ? in + in_start : in; seems to be optimized to the same code as in + in_start, at least with modern GCC and Clang. So there won't be an extra branch in reality.

To me this seems like a bug in the standard that could have been fixed by adding an extra sentence to explicitly allow null-pointer + 0. Based on search engine results, it seems that it was decided that it's better to change hundreds of codebases instead, hopefully spotting every problematic case. Feels a bit similar to the memcpy(NULL, NULL, 0) issue that was (hopefully) fixed in XZ Utils in 2019.

Thanks for reporting this!

Larhzu added a commit that referenced this issue Feb 23, 2023
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
#36
@Larhzu
Copy link
Member

Larhzu commented Feb 23, 2023

It should be fixed now. Thanks!

XZ Embedded has the same problem. The initial plan is to fix it by changing the API documentation to say that the input and output buffer pointers must not be NULL even for empty buffers. First the code in the Linux kernel has to be checked if NULLs are used in xz_dec_* calls.

@Larhzu Larhzu closed this as completed Feb 23, 2023
Larhzu added a commit that referenced this issue Mar 11, 2023
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
#36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants