-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Fixing the loophole in unallow & unsubscribe #1905
Comments
First, I kind of like this proposal. Allowing the core kernel (aka the kernel crate) to actually track app slices and callbacks seems like a more robust design, for a couple reasons:
That being said, I don't think there is anything wrong with what we currently have, it's just a different model. If the kernel is malicious, I don't see how userspace can expect any guarantees, so it's reasonable to expect that capsules do want to do the correct thing. And I also think it's a perfectly reasonable design that once an app shares memory with the kernel the app can't revoke it. I'm not saying it's the best model, necessarily. |
As for Tock 2.0, I worry that we won't be able to ship it because everything that we want to include. For this issue, I think we can reasonably use our current machinery in the 2.0 release, but with a different promised interface to userspace as a temporary solution. The implementation might not exactly match the promised interface, which wouldn't be ideal, but then we could freely implement it post 2.0 without any breaking changes. |
Regarding the fact that there is a loophole in what we currently have, this is discussed in tock/libtock-rs#143. In particular, quoting @jrvanwhy's comment (tock/libtock-rs#143 (comment)).
In particular, I don't see how one can possibly guarantee things like "all accesses to the buffers must be done in a single instruction" (how can one read or write a USB packet of 64 bytes in a single instruction?). I also encourage you to re-read the "So where are the loopholes in this setup?" part of my first comment here (#1905 (comment)). If you think that there is nothing wrong about any of these points, could you please be more specific about why?
It's totally unreasonable for capsules transmitting/receiving packets such as USB or UDP. When either of the userspace or the kernel reads/writes a packet, it must have exclusive access to the underlying slice for the complete duration of this operation (which necessarily takes multiple CPU instructions for packets of more than 4 bytes on 32-bit CPUs). |
I'm not entirely sure I understand the reasoning behind capsules not being allowed to decide whether to keep a Slice or Callback upon On the other hand, I can imagine an approach where the Rust ownership guarantees are used. Upon an This could potentially better encapsulate the idea of a capsule currently working on something, dependent on buffers being shared. Imagine a DMA operation: if this were to happen on an In addition to that, we would avoid evaluating the The current approach with a capsule working on something and a Slice or Callback being removed in the meantime is to continue the (asynchronous) operation, then notice the resource has vanished, throwing away the results - or worse any report of side effects that this operation has caused. Permitting a capsule to refuse giving up on shared Slices or Callbacks would not violate any of the safety concerns you mentioned, but introduces the risk of blocking applications on repeatedly trying to |
Capsules could indeed be allowed to reject the unallow/unsubscribe request, but this should be communicated to userspace via an error code such as I still see some problems with @lschuermann's proposal of "transferring back" the slice by returning an
So I'd rather suggest to give capsules freedom to accept or reject allow/subscribe requests, by returning either a boolean (surfaced back as |
First and foremost, I'm vouching for a kernel-held table of slices/callbacks as the kernel would need to ensure that no region is shared twice, etc. I think this issue can be easily circumvented by giving each shared Slice a (counted) ID and storing that in the table as well. In case of mismatch (would be a pretty bad error in the capsule), that's an unrecoverable situation.
When the kernel holds a table anyways, an AppSlice could be made as thin as a pointer to the memory, the length, and a (kernel internal / hidden) pointer to the entry in the kernel-held table. We're talking about 3 I'm vouching for this approach (which should be from a user perspective quite similar to your approach @gendx) because of
|
This gets into the distinction between a data race (defined in the Rustonomicon) and other race conditions. The exact meaning of volatile hasn't been set in stone yet, but from what I see of the conversation, read_volatile and write_volatile will probably be defined to compile to a sequence of 1 or more read/write operations that each execute in a single instruction. Interrupts are synchronized to occur between instructions of the userspace thread, which should(*) provide the synchronization we need between userspace and the kernel. That allows us to avoid data races in the kernelspace/userspace boundary, even with shared memory. Volatile reads and writes can still tear across the userspace/kernel boundary, however. That represents a different race condition that doesn't cause undefined behavior if handled appropriately (i.e. only read types where arbitrary bit patterns are acceptable). (*) If volatile accesses don't have this property, it would be impossible for kernel code to share data between an interrupt handler and the main thread on systems without atomic instructions. That seems like a necessary property for Rust to have given its embedded use cases. |
I don't see how volatile has anything to do with this. Consider the following example (in userspace with libtock-rs). // Handler invoke upon a callback from the kernel that a packet was received.
fn handle_receive(allowed_slice: &[u8; 64]) {
let mut packet = [0; 64];
packet.copy_from_slice(&allowed_slice);
// Do something with the received packet.
} Even with volatile semantics, it's clear that The following sequence of events can happen, volatile or not:
Now, the I hope that we can all agree that this is a problem. And that the only way to prevent this from happening is to have a way for the kernel to tell userspace that it guarantees it exclusive access to
Maybe the problem I outlined is not a "data race" in the rustnomicon sense, but it's still a race condition that causes memory corruption. In the logical sense, a 64-byte packet is a type of its own. Even if reading a corrupted packet doesn't directly trigger a buffer overflow or diversion of the control flow (via corruption of the stack, or of pointers in an object's vtable), I disagree with the statement "it's fine, all bit patterns are acceptable in a [u8; 64]". Because such a statement is equivalent to "the contents of an allowed slice are always arbitrary", and that is contrary to the semantics that we want for a driver. For example, for a USB driver, we want userspace semantics meaning something like "after allowing a receive slice, sending a receive command, followed by the receive callback, and unallowing the receive slice, the receive slice now contains the next USB packet received by the USB controller". If we replace these semantics by "after allowing a receive slice, sending a receive command, followed by the receive callback, and unallowing the receive slice, the receive slice now contains arbitrary contents", what can we use such a driver for?? As such, it's as much rendering the whole system unusable as another form of UB. And it's as much a potential security vulnerability depending on what the application does with the now-corrupted packet. |
No matter what, an application has to trust that a capsule it uses is written correctly. Even if the core kernel were responsible for managing allowed buffers, the USB driver could write arbitrary contents into the receive slice before the slice is unallowed by the application. So is the current reliance on trusting a capsule to respect an unallow request really any different than how apps must always trust a capsule to correctly populate a packet with the received information? (with the caveat that we are more likely to catch one in code review than the other) (updated for clarity in response to brads comment) |
That is correct. However, when the core kernel can guarantee that a capsule no longer has access to a slice of app memory, the app can reasonably assume the content in that slice never unexpectedly changes, or else the core kernel would have a bug. I side with @gendx here, this would be an improvement over the guarantees which the kernel can provide with app/capsule interaction. It may not be UB in the ISA / Rust sense, but significantly reduces the damages that poorly written / malicious capsules can cause. Buffer handling is hard after all 😄; no one wants a capsule holding on to all Changed terminology as per @bradjc's request. |
I think so we can have a coherent discussion, we have to use the same terminology. From https://github.com/tock/tock/blob/master/doc/Design.md, everything under the syscall interface is the kernel. That is, a malicious capsule means the kernel is malicious. An easy way to separate out capsules is to refer to the "core kernel". |
In general, capsules and apps can work together to achieve the desired behavior. Maybe this doesn't map well to a Rust userpsace, and maybe userspace wants a less "trusting" interface to the capsules. A shared buffer isn't inherently a problem. But it's important, when discussing designs, to be open to alternative designs that prioritize different objectives. I think you are arguing that first-class support for a Rust userland is a key objective for the Tock kernel. Someone else with a different goal might argue for a very different design. That doesn't mean one is right or wrong. Once we agree on the objectives, then we can evaluate how well a design achieves those objectives. In general, I think Tock wants to support a robust libtock-rs. But, it's not the only objective. |
I understand the trust boundaries as follows.
This is, I think, the spirit of https://github.com/tock/tock/blob/master/doc/Design.md#capsules. In particular, the following sentence doesn't agree with @bradjc's assessment that "a malicious capsule means the kernel is malicious".
Then, the design mentions resources.
To me, the issue being discussed here deals exactly with this latter point. The loophole originally described in tock/libtock-rs#143 is exactly that: capsules that hold onto resources (callbacks, application slices), when they shouldn't anymore (due to an unallow/unsubscribe request). If there is a way to leverage the "Rust type system" (under which I encompass
Well, in practice I interact more with the "kernel" as in everything in the |
A shared buffer is inherently a problem, as shown by the example I've given in #1905 (comment) (unless there's a flaw in this example). And this has nothing to do with the userspace being written in Rust, the use of Rust was just illustrative. The same problem (copying a 64 byte packet from an allowed slice to a local buffer, and being interrupted by the kernel in the middle of it, with a malicious/buggy capsule writing the next packet in the slice) will also exist in a userspace written in C or any language.
I have only given illustrative examples of a Rust userland, but the problem discussed here is by no means Rust-specific. The fact that the problem was first discovered in the Of course, we can also ignore this potential loophole and just trust capsules to do the right thing (e.g. destroying handles to app slices upon But I also think that fixing this loophole is worth it because:
|
This is great. This is an important problem and I agree we need to solve it for Tock 2.0. However, I'd like to separate this discussion into 3 steps:
What do you think? |
I definitely agree. I tried to sketch a solution, to get an understanding of whether this problem could be reasonably solved, but the most important for now is indeed to agree on our understanding of the problem, and on the properties we want to achieve. Referring back to the threat model (https://github.com/tock/tock/tree/master/doc/threat_model#isolation-provided-to-processes), I think it would be relevant to define the following parts more precisely.
My understanding of "explicitly permitted by the process" is that
The same remark applies here, we need to precisely define the starting and ending points of "when allowed". |
If the capsule is well-behaved (doesn't touch the buffer after an allow(null) call), then this race condition cannot happen. Corollary: this race condition only happens when a capsule is misbehaving. If a capsule is malicious, then using volatile operations changes the outcome from "the capsule triggers undefined behavior" to "the app may read corrupt data from the buffer". If the capsule is malicious, then inserting corrupt data into the buffer was always a possibility anyway.
I want to distinguish bugs (such as non-compliance with the threat model) from undesirable design decisions (such as requiring all buffers to be static). I agree that both of them are problems, but the level of importance and timelines for solving them are likely different. |
To clarify: the blast radius of "trusting a capsule to not return corrupt data" and "trusting the capsule to not invoke undefined behavior" are very different. A malicious capsule that was trusted (incorrectly) to not invoke undefined behavior can potentially compromise any app data or behavior. A malicious capsule that was trusted to behave correctly can break the parts of the app that rely on that behavior. E.g. a malicious capsule that exposes a logging API cannot compromise an app's cryptographic keys because that app won't share its keys with the logging capsule anyway. It could totally destroy logging and lie about what the app did, of course. |
Another possible issues comes up when thinking about app restarts. In the current implementation (as far as I can tell), As long as that's the case, we don't only have the issue of when an process shares a buffer/callback once, it must forever trust the capsule that this memory will not be modified unexpectedly, but instead a restarted process must assume that all of its memory can be modified by any capsule at any time, as the process does not necessarily know what slices have been shared in the previous instance of that same app. This is clearly undesirable and shows that even with regards to process restarts, we cannot continue to ignore resources previously shared by the process into the kernel. To avoid this, the kernel must keep track of associated resources and run a cleanup routine prior to starting a new instance. This is a step in the right direction. |
Let's keep in mind that there is a whole spectrum between a "perfect capsule" and a "totally malicious capsule". In practice, we likely won't face a "totally malicious capsule", because a basic code review will rule that out. But we will rather face more subtle bugs due to forgetting one line of code or complex concurrency logic. See the UDP example where one of the slices is never redeemed to the app. So saying that malicious capsules can do anything and stopping at that fact is a wrong approach IMO. Because this misses the realistic bugs, such as a packet buffer being corrupted due to a subtle bug leading to concurrent access by the capsule and userspace. I also don't fully agree on "inserting corrupt data into the buffer was always a possibility anyway". Because with clearly defined boundaries on ownership between In any case, a minimum of code review of the capsules one uses is necessary (as a sanity check to make sure that each capsule provides the functionally it advertises). An obviously malicious capsule won't pass such review, but more subtle bugs could slip in. And this is where IMO the threat model and the safeguards provided by the core kernel can help, because they limit the scope of what a capsule can possibly do, and therefore limit the scope of the possible behaviors one has to account for when reviewing the code. It's much easier to reason about resources exclusively owned by userspace/capsules, rather than to manually go through the logic to rule out all possible data races.
I'd like to point out that a loophole is not in itself a bug. It can lead to a bug if some capsule exploits this loophole, but until a capsule actually exploits it there is no bug. So even if in principle a capsule could violate the threat model (although as I mentioned in #1905 (comment) the threat model is not precise enough yet to know for sure if there's really a violation) by retaining access to a slice after disallow, the threat model is not violated if none of the capsules violates it. And apart from the UDP example (which I personally don't rely on so far), I haven't seen any violation in practice. So in terms of timeline, I don't see an urgent need to fix any bug. But I see an opportunity to improve the threat model's wording as well as the design of allowed slices (and callbacks) in the kernel, to reduce the risk of potential future bugs. Whether that should be a priority is up to the core group to decide. Regarding userspace, unless there is a specific capsule that indeed keeps a hold on a slice forever, I don't see a need to require all capsules to require static buffers. That would be a way too strong limitation that is neither pragmatic nor necessary (there's currently no capsule exploiting the potential loophole). It's much more pragmatic to instead focus on how to get rid of the loophole so that the loophole never becomes a bug in the future. |
We discussed this in depth on the core call today (see #1929). The conclusion was that there is general consensus that having the core kernel track callbacks and slices including support for unallow, rather than relying on capsules to "do the right thing", is a positive change, and that a PR that accomplishes this would be welcomed. We did not discuss exactly what such a design would look like, but there was agreement that a design which makes it impossible for unallow to fail would be vastly preferable to one where unallow can fail, from the perspective of reducing the complexity of userspace error handling. There was also some discussion of whether it should be allowed for userspace buffers to be passed to hardware, and the conclusion was that this should be possible if there is a guarantee that an app will not run until a DMA call succeeds or aborts. |
For now, this issue has been addressed by #2639 (upcalls) and #2632 (process buffers). A capsule can still deny returning a process buffer to userspace and there is no mechanism to allow userspace to ask the kernel to force the return of the process buffer. But, this may be a sufficient design for our intended use cases. If we find out that the current guarantees are not strong enough we should open a new issue. |
Some months ago, an issue was open on libtock-rs regarding a potential loophole when unsubscribing from a callback or unallowing a slice (tock/libtock-rs#143). This led to some documentation changes in #1831.
The discussion was revived last week, and this loophole is a deeper problem than I previously thought. I'd like to expose what the problem is in the current model, and propose a way to fix it.
I also think this is a major issue that should be addressed before releasing Tock 2.0.
TLDR: The current grant infrastructure gives too much freedom to capsules, and doesn't fit in the threat model of untrusted capsules, because capsules can keep access to callbacks and slices forever.
Observations
Allow and the
AppSlice
typeWhen a userspace application uses the
allow
syscall to allow or disallow a slice, the kernel scheduler creates anAppSlice
object via theProcess::allow()
function and then dispatches it to the relevant capsules withDriver::allow()
.tock/kernel/src/sched.rs
Lines 495 to 515 in c814364
tock/kernel/src/driver.rs
Lines 106 to 113 in c814364
A typical capsule will take this
AppSlice
object and transfer it into the grant object corresponding to this (capsule, process) pair.For example, the RNG driver has an
App
object (put into the grant), which stores a handle to the allowed buffer in itsbuffer
field.tock/capsules/src/rng.rs
Lines 33 to 38 in c814364
tock/capsules/src/rng.rs
Lines 135 to 152 in c814364
This
AppSlice
object directly contains a raw pointer + length. These are private fields, and one would expect that onlyunsafe
operations would allow to access the slice, and because capsules are forbidden to useunsafe
code (this restriction is the basis of "untrusted capsules" in the threat model), we may think that this isn't problematic.tock/kernel/src/mem.rs
Lines 64 to 67 in c814364
However,
AppSlice
implementsAsRef<[T]>
andAsMut<[T]>
, which completely bypasses all checks for unsafety!tock/kernel/src/mem.rs
Lines 124 to 128 in c814364
So where are the loopholes in this setup?
AppSlice
. This means that:'static
), because a capsule can potentially access any allowed buffer forever. This is mentioned in Subscriptions and shared memory have a loophole which can lead to UB libtock-rs#143 (comment).AppSlice
handle to it forever. In Subscriptions and shared memory have a loophole which can lead to UB libtock-rs#143 (comment), it's proposed to work-around that by using a combination ofCell
and/or clobbering memory upon context switches and/or using volatile memory accesses. However, none of these work-arounds fix the underlying issue (data race between userspace and the kernel).Indeed, from the userspace's point of view, the kernel runs as a separate thread that can be scheduled after any userspace instruction (due to an interrupt happening at any time). So if we consider the example of a driver sending/receiving packets, and passing them to/from userspace via allowed slices, there will always be a risk that userspace reads half a packet, is then preempted by the capsule which writes a completely new packet, and then userspace resumes to read the second half of a different packet. And vice-versa with userspace writing a packet and the capsule reading it.
It's also mentioned that the driver cannot rely on the content of an allowed slice to remain unchanged.
tock/kernel/src/driver.rs
Lines 102 to 104 in c814364
tock/kernel/src/process.rs
Lines 1074 to 1078 in c814364
What can we do about it (short version)?
allow
syscall.allow(ptr)
transfers exclusive access from userspace to the kernel.allow(null)
must transfer back exclusive access from the kernel to userspace. We can imagine thatallow(null)
could fail (although I don't have any example of such a capsule), but that should result in an error code so that userspace can e.g. retry later. Ifallow(null)
can silently fail to get back exclusive access, then slices allowed to the kernel are effectively thrown from userspace into a black hole, from which information can never be read back.The current
AppSlice
infrastructure fails to offer such guarantees.Subscribe and the
Callback
typeA similar problem happens with callback subscriptions.
tock/kernel/src/callback.rs
Lines 148 to 153 in c814364
schedule
at any time.tock/kernel/src/callback.rs
Line 178 in c814364
subscribe(null)
), pending callbacks are flushed (since When subscribing to a callback, remove all pending function calls for this callback ID #1376)tock/kernel/src/sched.rs
Line 425 in c814364
Callback
handle around to invoke an old callback at any time.'static
lifetime (i.e. static functions rather than lambdas allocated on the stack). There is a safeguard regarding data races because the scheduler will only invoke the callbacks upon ayield
point, but requiring static functions for callbacks is an unnecessary limitation nonetheless (and we're back to the problem fixed in When subscribing to a callback, remove all pending function calls for this callback ID #1376).What do capsules do?
Most capsules don't actually exploit this loophole. Instead, there is a large amount of code duplication, as pretty much all capsules have the following code.
An interesting exception is the UDP driver, for which slice 1 is never disallowed. I assume this is a bug in this driver rather than intentional.
tock/capsules/src/net/udp/driver.rs
Lines 372 to 385 in c814364
Question: Is there any legitimate use case where a capsule denies revocation of an allowed slice or of a subscribed callback?
Another note is that all capsules that I looked at allocate allow/callback numbers sequentially, i.e. allow numbers are
0..N
and subscribe numbers are0..M
, wereN, M
vary by capsule.About the memory footprint of grants
As I previously mentioned in #1761 (comment), there is quite some amount of redundancy in the storage of
AppSlice
andCallback
. For each (capsule, process, allow/subscribe number) tuple there is a reference to anAppId
which is completely redundant.How to fix it? [Currently a rough draft]
In this section, I draft a proposal of how this loophole could be fixed. This makes sure that capsules cannot decide to keep a handle to slices and callbacks. This also optimizes the memory layout of grants, to avoid the pitfalls noted in #1761 (comment).
Memory layout of grants
I first propose to revisit the memory layout of grants, taking into account the fact that each driver uses allow numbers
0..N
and subscribe numbers0..M
.Taking as basis the current memory layout
tock/kernel/src/grant.rs
Lines 180 to 195 in c814364
I propose to replace the opaque memory layout of
GrantRegionI
by:N
(pointer, usize) pairs (representing the allowed slices),M
pointers (representing the callbacks),There needs to be a way for each capsule to declare the values
N
andM
upon initialization. In the long run, this should be possible to integrate in theGrant
andAppliedGrant
objects at compile time with const generics (rust-lang/rust#44580). The const generics feature might even be stable enough today if we want to go that route.If we take the example of the RNG driver, then
N = 1
(one slice),M = 1
(one callback). TheApp
andRngDriver
structurestock/capsules/src/rng.rs
Lines 32 to 44 in c814364
become the following. Note that the
callback
andbuffer
fields are not in theApp
struct anymore.Kernel scheduler
Upon
allow
andsubscribe
syscalls, the kernel scheduler directly modifies the tables containing the allow ptr/len or callback ptr, without asking the driver. The only failure modes worth returning an error to userspace are:Capsules could be given some amount of control in what can be allowed. For example, the UDP driver may reject too large slices
tock/capsules/src/net/udp/driver.rs
Lines 376 to 380 in c814364
In that case, the
Driver::allow
andDriver::subscribe
functions have to change, to avoid giving access to aCallback
orAppSlice
object. I guess something like the following is reasonable:Capsule access to slices and callbacks
When a capsule want to access an allowed slice or callback, it queries it from the
AppliedGrant
object (which has a handle to the necessary AppId). In particular, theenter
function would provide handles to newAllowedTable
andCallbackTable
objects.tock/kernel/src/grant.rs
Lines 25 to 35 in c814364
Details are still TBD, but the tables could essentially provide the following APIs.
Kernel structures
The
AppSlice
andCallback
structures will have to be revisited accordingly and/or won't be needed anymore.The text was updated successfully, but these errors were encountered: