Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: host stack is broken without CONFIG_ASSERT #73316

Open
jori-nordic opened this issue May 27, 2024 · 12 comments
Open

Bluetooth: host stack is broken without CONFIG_ASSERT #73316

jori-nordic opened this issue May 27, 2024 · 12 comments
Assignees
Labels
area: Bluetooth Host Enhancement Changes/Updates/Additions to existing features

Comments

@jori-nordic
Copy link
Collaborator

The Bluetooth host stack doesn't implement enough error handling.
A lot of parameter and state checking happens using __ASSERT() macros.

This is a good thing, as it reduces the mental load of parsing error handling. That "error handling" usually results in a broken state somewhere else, leaks, etc..

So we need some sort of ASSERT() macro that is always enabled.

Two solutions:

  • convert all __ASSERT() macros to BT_ASSERT()
  • do a select ASSERT when Bluetooth is enabled

NB: in the future, that macro could be redirected to allow graceful shutdown of the Bluetooth thread, if running in userspace. That way an error condition in the stack will no longer bring down the entire kernel.

@jori-nordic jori-nordic added bug The issue is a bug, or the PR is fixing a bug area: Bluetooth Host labels May 27, 2024
@jori-nordic jori-nordic self-assigned this May 27, 2024
@aescolar aescolar added Enhancement Changes/Updates/Additions to existing features and removed bug The issue is a bug, or the PR is fixing a bug labels May 28, 2024
@henryzhang62
Copy link

Would you please add steps to reproduce it?

@alwa-nordic
Copy link
Collaborator

alwa-nordic commented May 28, 2024

Would you please add steps to reproduce it?

I believe this refers to cases like bt_conn_unref(NULL) (in application code) evaluating to undefined behavior (UB) when asserts are off. The question is whether it's harmful to allow UB as a result of API misuse when asserts are off, in public API, or even internal API.

The argument is that developer time saved by catching these errors is well worth the extra bytes and cycles many times over. I agree. My opinion is that simple asserts should be on-by-default globally in Zephyr, and we maintainers should make sure assert expressions are not too expensive. Disabling asserts should be a this-train-needs-no-brakes move by those who are forced into that particular requirement and are ready to bear the cost of harder debugging.

jori-nordic added a commit to jori-nordic/zephyr that referenced this issue Jun 19, 2024
We depend on asserts a lot in the Bluetooth subsystem.

Not enabling them will (and has) lead to confusing errors down the line,
in unrelated parts of code.

Fixes zephyrproject-rtos#73316 (kinda)

Signed-off-by: Jonathan Rico <jonathan.rico@nordicsemi.no>
@pdgendt
Copy link
Collaborator

pdgendt commented Jun 19, 2024

Just some comments to take along:

  • Assertions in most languages are no-ops when disabled, and crash/abort if a predicate is false, so we shouldn't depend on it in production code.
  • If a public API has documented assumptions, developers should adhere to them, and add checks before calling the functions, assertions in the callee code are there to help developers point out invalid calls.
  • There is also CHECKIF that can be used as an assertion check or a runtime check (enabled by default with USERSPACE).

@alwa-nordic
Copy link
Collaborator

alwa-nordic commented Jun 19, 2024

Just some comments to take along:

  • Assertions in most languages are no-ops when disabled, and crash/abort if a predicate is false, so we shouldn't depend on it in production code.

  • If a public API has documented assumptions, developers should adhere to them, and add checks before calling the functions, assertions in the callee code are there to help developers point out invalid calls.

  • There is also CHECKIF that can be used as an assertion check or a runtime check (enabled by default with USERSPACE).

I agree with you on all points. I think the issue here is: What should we recommend to our users? Asserts always on? Or asserts off in production? A default config is a pretty strong recommendation, so we should give it some consideration.

I'm of the opinion that asserts enabled in production is a good thing by default. And those who have special needs can make the choice for themselves. So I wish we could turn the tide and enable asserts by default for all of Zephyr.

@jori-nordic
Copy link
Collaborator Author

Some data for @Thalley . I was expecting more asserts than that 😅

The point is, maybe we should have a IF_RETURN(condition, message, retval) macro for the common pattern of checking parameters/state, logging an error and returning some value. Would de-clutter the host a bit I think.

null de-references:

/* Check if attribute is a characteristic value */
if (bt_uuid_cmp(attr->uuid, BT_UUID_GATT_CCC) != 0) {
attr = bt_gatt_attr_next(attr);
__ASSERT(attr, "No more attributes\n");
}
/* Find the CCC Descriptor */
while (bt_uuid_cmp(attr->uuid, BT_UUID_GATT_CCC) &&
/* Also stop if we leave the current characteristic definition */
bt_uuid_cmp(attr->uuid, BT_UUID_GATT_CHRC) &&
bt_uuid_cmp(attr->uuid, BT_UUID_GATT_PRIMARY) &&
bt_uuid_cmp(attr->uuid, BT_UUID_GATT_SECONDARY)) {
attr = bt_gatt_attr_next(attr);
if (!attr) {
return false;
}
}

atomic_set_bit(bt_dev.flags, BT_DEV_HAS_PUB_KEY);
__ASSERT_NO_MSG(new_cb->func != NULL);
new_cb->func(debug_public_key);
return 0;

Broken logic:

__ASSERT_NO_MSG(buf->len >= sizeof(struct bt_att_hdr));
data->opcode = buf->data[0];
data->err = 0;

/*
* PDU must not be allocated from ISR as we block with 'K_FOREVER'
* during the allocation
*/
__ASSERT_NO_MSG(!k_is_in_isr());

ACL Control flow broken

if (!node) {
LOG_ERR("packets count mismatch");
__ASSERT_NO_MSG(0);
break;
}

Bonding that falls on the floor if there's an error

unpair_err = bt_unpair(conflict->id, &conflict->addr);
__ASSERT_NO_MSG(!unpair_err);
}

@alwa-nordic
Copy link
Collaborator

alwa-nordic commented Jun 25, 2024

Bonding that falls on the floor if there's an error

unpair_err = bt_unpair(conflict->id, &conflict->addr);
__ASSERT_NO_MSG(!unpair_err);
}

This is an actual assertion. I assert that it's impossible for it to fail. You can read 'assert' as 'theorem' here.

@Thalley
Copy link
Collaborator

Thalley commented Jun 25, 2024

Some data for @Thalley . I was expecting more asserts than that 😅

The point is, maybe we should have a IF_RETURN(condition, message, retval) macro for the common pattern of checking parameters/state, logging an error and returning some value. Would de-clutter the host a bit I think.

Hmm, we could possibly do that. However in https://docs.zephyrproject.org/latest/contribute/guidelines.html#coding-style we state

In general, follow the Linux kernel coding style

Which in return in https://www.kernel.org/doc/html/v4.10/process/coding-style.html#macros-enums-and-rtl states

Things to avoid when using macros:

macros that affect control flow:

#define FOO(x)
do {
if (blah(x) < 0)
return -EBUGGERED;
} while (0)

And I generally agree with that. Implicit return statements (via macros or other ways) really messes with the readers flow.

So if you want to propose a macro with a return, be prepared for some pushback on that (and not just from me ;) )

@jori-nordic
Copy link
Collaborator Author

DTS macrobatics are kosher though, definitely don't mess with people's heads, writing a driver is E-Z.
I would also argue that an ifdef soup like we seem to love qualifies as a "macro that messes with the control flow".
Sigh.. sorry for the deranged stream of thought, I'm just frustrated we're stuck with this language :|

In principle, I also agree that macros shouldn't mess with control flow.
Doing it correctly will end up with having to scroll through one page of error checking prologue for each API, all with the same if (bad) { log-something; return -EFIGURETHISOUT } copy-pasted all over. When I'll get to the fn logic, I'm sure I will already have forgotten why I was looking at that fn in the first place..

I really don't know what to do.

@Thalley
Copy link
Collaborator

Thalley commented Jun 25, 2024

Sigh.. sorry for the deranged stream of thought, I'm just frustrated we're stuck with this language :|

It was going so well until you and @alwa-nordic started coming here with all your rationale and logic. It was so much easier when no-one questioned why we do what we do ;)

@jori-nordic
Copy link
Collaborator Author

To add some perspective, we tried to write an H4 HCI packet decoder in rust with @alwa-nordic and it was even more verbose than the C version 😅 . So not sure the grass is greener on that side either.

@alwa-nordic
Copy link
Collaborator

alwa-nordic commented Jun 26, 2024

To add some perspective, we tried to write an H4 HCI packet decoder in rust with @alwa-nordic and it was even more verbose than the C version 😅 . So not sure the grass is greener on that side either.

This does not represent my opinion. /disclaimer 🙂

@alwa-nordic
Copy link
Collaborator

Doing it correctly will end up with having to scroll through one page of error checking prologue for each API, all with the same if (bad) { log-something; return -EFIGURETHISOUT } copy-pasted all over. When I'll get to the fn logic, I'm sure I will already have forgotten why I was looking at that fn in the first place..

Maybe this could be solved be adding folding marks around the obvious stuff. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth Host Enhancement Changes/Updates/Additions to existing features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants