Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bt_rand() is called over HCI when BT_HOST_CRYPTO=y, even if BT_CTLR_LE_ENC=n #22202

Closed
alexandru-porosanu-nxp opened this issue Jan 25, 2020 · 10 comments · Fixed by #22339
Closed
Assignees
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@alexandru-porosanu-nxp
Copy link
Collaborator

Describe the bug
If LE encryption is not supported (like the case for VEGA controllers), and host encryption is enabled, there will still be calls to the controller for obtaining the necessary 8 bytes of random data.
This is done in two places in subsys/bluetooth/host/crypto.c, in prng_reseed() and in prng_init().

The problem is that the sequence of calls is actually a kind of infinite recursion, as per below

prng_reseed() -> BT_HCI_OP_LE_RAND -> le_rand() -> bt_rand() (host) -> prng_reseed()

The call to bt_rand() in host is done because only one implementation of the bt_rand() function (either from controller or from host) is being linked at a time.

Although I have not tested this, I suppose that this could be reproducible on Nordic as well, if setting BT_CTLR_LE_ENC_SUPPORT to n and enabling host crypto. Will test later on, when I can get access to a Nordic board.

To Reproduce
Steps to reproduce the behavior:

  1. mkdir build; cd build
  2. cmake -DBOARD=rv32m1_vega_ri5cy -DBT_HOST_CRYPTO=y
  3. make
  4. run on board and notice an error similar with the one below:
[00:00:00.120,000] <err> os: Exception cause Illegal instruction (2)
[00:00:00.120,000] <err> os: Faulting instruction address = 0x0d8fe860
[00:00:00.120,000] <err> os:   ra: 0x832f304a  gp: 0xec7e3d7e  tp: 0xe25f7e36  t0: 0x306eda80
[00:00:00.120,000] <err> os:   t1: 0x2c6fde6f  t2: 0x30a507e6  t3: 0x6a09e667  t4: 0xbb67ae85
[00:00:00.120,000] <err> os:   t5: 0x3c6ef372  t6: 0xa54ff53a  a0: 0x510e527f  a1: 0x9b05688c
[00:00:00.120,000] <err> os:   a2: 0xb78b2e79  a3: 0xb9a2e204  a4: 0x2fb44915  a5: 0xd90c5060
[00:00:00.120,000] <err> os:   a6: 0xd4839fe3  a7: 0x579a9afc

This is due to the fact that eventually the stack will run out.

Expected behavior
The controller random function should not be called when BT_CTLR_LE_ENC_SUPPORT is not enabled. This is according the the BLE spec, vol 2, part E, 3.1 LE Controller Requirements: where the HCI_LE_Rand command is noted as C4; per the note "C4: Mandatory if LE Feature (LE Encryption) is supported, otherwise excluded.".

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: VEGA Toolchain
  • Commit SHA or Version used: 7cb7465

Additional context
One quick-fix would be to re-implement prng_reseed() and prng_init() to call locally the entropy device. Another dirty fix would be to just use the kernel ticks (k_uptime_get) as the 8 bytes of 'entropy'.

@alexandru-porosanu-nxp alexandru-porosanu-nxp added the bug The issue is a bug, or the PR is fixing a bug label Jan 25, 2020
@jhedberg
Copy link
Member

The correct way for the host to test LE Encryption support in the controller is:

if (BT_FEAT_LE_ENCR(bt_dev.le.features))) {
    ...
}

The reasons why the host is avoiding to use a local entropy source are mainly historical: in the early days of Zephyr (3-4 years ago) most boards did not have an implementation for it, which resulted in them easily falling back to using a "dummy" PRNG which was something similar to what you suggested with k_uptime_get(). Since this is extremely insecure and we didn't want to risk anyone ending up with a product with such a security vulnerability we always just used the controller-side entropy source since, until now, all controllers we've ever used have had this available.

I'm fine with falling back to the local entropy support if the controller indicates lack of encryption support, but we have to ensure that it's not a dummy one. In any other scenario the Bluetooth init (bt_enable) should fail. In fact, if there's a reliable and secure local entropy source it's probably more efficient to use that instead of using HCI.

@George-Stefan
Copy link
Collaborator

On a combined build CONFIG_BT_CTLR_CRYPTO is selected by default and the corresponding crypto.c file that calls into ecb.c functions.

Switching to CONFIG_BT_HOST_CRYPTO (if there is no crypto hw acceleration in controller) on a combined build results in that deadlock

@joerchan
Copy link
Contributor

joerchan commented Jan 27, 2020

@George-Stefan The controller shouldn't do calls into to the host. And currently bt_rand is used by the controller. For example to create the access address of the connection (ull_master.c).
I think what we should do is to make sure that the controller never calls bt_rand itself, but provides bt_rand as a function that the host can call.

@cvinayak @jhedberg So I would rename the implementation and all uses of bt_rand in the controlelr to bt_ctlr_rand(), and then introduce bt_rand which calls this function. That would solve the recursion problem.
Also since it appears that the controller already has a dependency on having a bt_rand function available I don't see the problem with implement the LE Rand command even though encryption is disabled.

For the host we should also handle the case where the controller does not provide the LE Rand function. That would mean that the host would possibly do a call to the entropy device. But if there is no entropy device in the host it appears that we have a problem. The host would need these random numbers for the Privacy feature. Which relies on secure random numbers. Using k_uptime would not be good enough for this.

Is there a non-secure rand function implement somewhere? We appear to use bt_rand most places, not all of those need secure random numbers I guess?

@jhedberg
Copy link
Member

I don't see the problem with implement the LE Rand command even though encryption is disabled.

We could kind of do that.. I suppose. The reason I hesitate is that the core spec states LE Encryption as the feature which makes this command available. That said, instead of checking for supported features the host could be checking for supported commands (which is what we do when the core spec doesn't clearly specify what feature enables support for an optional command)

@jhedberg
Copy link
Member

...or then we just pretend ignorance and keep the host code presuming that this command is always available ;)

@joerchan
Copy link
Contributor

So the use case we need to consider is a privacy-enabled host build with a controller without encryption, and where the host does not have a access to secure random numbers. Using pseudo-random numbers with privacy would mean you could track it across RPA update simply because the rand part of the RPA would be known.

@jhedberg Yeah I meant that we could include the function and set the function as supported, in which case the host would have to read the supported commands instead of features. Is there any possible drawbacks to this?

@joerchan
Copy link
Contributor

Maybe the discussion I started should be taken to it's own issue?

In any case I think what should be done to solve this problem is what I already mentioned:

So I would rename the implementation and all uses of bt_rand in the controlelr to bt_ctlr_rand(), and then introduce bt_rand which calls this function.

@jhedberg
Copy link
Member

in which case the host would have to read the supported commands instead of features

The host is already reading the supported commands and storing them into bt_dev.supported_commands. The only "drawback" would then just be an additional branch in the code.

@joerchan
Copy link
Contributor

prng_init checks for supported command, and returns -ENOTSUP in case it is not supported. prng_reseed does not do this, in this case we should get -EIO. The caller of bt_rand will have to handle this error in that case. The host will check this error code and should not start any roles that needed privacy. Same for other users of bt_rand.

In my opinion this is good enough. We just need to fix the recursion problem.

@jhedberg jhedberg added the priority: medium Medium impact/importance bug label Jan 28, 2020
joerchan added a commit to joerchan/zephyr that referenced this issue Jan 29, 2020
Fix infinite recursion in host-based bt_rand function. This would call
HCI LE Random Number command, which would in turn call bt_rand, causing
an infinite recursion.

bt_rand -> prng_reseed -> BT_HCI_OP_LE_RAND -> le_rand -> bt_rand

To solve this issue the controller should avoid doing calls into the
host, so all calls to bt_rand in the controller should be replaced with
a call to a controller function.

Fixes zephyrproject-rtos#22202

Signed-off-by: Joakim Andersson <joakim.andersson@nordicsemi.no>
@joerchan
Copy link
Contributor

Although I have not tested this, I suppose that this could be reproducible on Nordic as well, if setting BT_CTLR_LE_ENC_SUPPORT to n and enabling host crypto. Will test later on, when I can get access to a Nordic board.

@alexandru-porosanu-nxp Although I don't get the exact same crash as you do, setting BT_CTLR_CRYPTO=n is definitively not working on Nordic either.

carlescufi pushed a commit that referenced this issue Jan 31, 2020
Fix infinite recursion in host-based bt_rand function. This would call
HCI LE Random Number command, which would in turn call bt_rand, causing
an infinite recursion.

bt_rand -> prng_reseed -> BT_HCI_OP_LE_RAND -> le_rand -> bt_rand

To solve this issue the controller should avoid doing calls into the
host, so all calls to bt_rand in the controller should be replaced with
a call to a controller function.

Fixes #22202

Signed-off-by: Joakim Andersson <joakim.andersson@nordicsemi.no>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants