-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strace feature to trace syscalls in the kernel. #1443
Conversation
|
Implementation-wise, this seems good to me. And this also seems like a useful feature to have for debugging or auditing. I think the biggest concern is around using features, as is the current persistent point of conflict. Perhaps we should discuss whether there other alternatives (e.g. moving the decision as to whether to include strace to the board crate rather than as a feature in the kernel) might make sense? Or if, rather, this is really a case where features are the best option. |
|
For some context: a number of folks have been burned by In some ways, Rust features are better than C-style (I'm summarizing my sense of other people's thoughts. Those people should chime in themselves and express their opinions more accurately. Reminder to please address this particular PR in the comments---in other words, please address how we can best include this proposed feature). |
|
For me, I think a service like strace falls on the line of either something that Tock supports or Tock does not support, in contrast to something that this particular compilation of Tock might support. One of the advantages afforded by an operating system is a somewhat standard suite of services afforded to applications; i.e. we have (in principle at least) a remote application update service.[1] While features are fine for on-the-desk, quick in-loop debugging, they make life an absolute misery once things are deployed. If I have a misbehaving node in the wild and I want to inspect application behavior, I want to be able to turn on strace -- it can't require re-flashing the kernel to turn on debugging options for deployed nodes. Especially because that necessarily restarts an application, which may be in a rarely seen state that I want to introspect. We had a nice demo put together last year around how to remotely perform compute performance tracking. I think I'd like to see strace looking more like that, with a design that's looking towards all debugging scenarios. [1] In practice, one major concern with this is that remote update is quite heavyweight in size / burden, and could be quite wasteful for nodes that don't use it. The solution was to have a privileged userspace application that does the majority of the work (download new binary, validate it, write it to flash, etc) that requires a minimal set of hooks in the kernel (networking, flash access, ability to stop an app and start an app). These hooks are always present in the kernel. <-- n.b. a decent chunk of this code is at a mostly working / proof of concept off in Signpost stuff Here, I think strace is light enough that it can simply be directly integrated in the short term, probably as a bit in the process struct and the I'm definitely very strongly opposed to something like this being a feature that only some versions of compiled Tock kernels support. Keeping track of that in practice is impossible. Written quickly on a plane before landing, sorry if things are unclear but wanted to get some stuff written / out there before call today. I'll try to join later if I can |
Personally, I really like the fact that Tock doesn't have I've had few patches rejected in the past that we really needed and which I think would have been useful, but core team felt otherwise (PR 1264 as an example). Something that we could potentially consider doing is to have something like |
|
Thanks for your feedback! A separate I was mostly seeing this strace/ptrace as a "print to debug" feature (as it is implemented right now), not so much as an "hypervisor" available to other apps (given the elevated privileges that it gives). I definitely agree with the ifdefs/features adding complexity & maintenance burden. On the other hand, the major advantage of conditional compilation is that it avoids the overhead of debug code, especially in terms of flash (due to extra code) and RAM (due to memory used by this extra code) usage, both of which are limited resources on embedded platforms. There can also be CPU overhead to formatting debug strings that are then thrown away when no debug output (UART, RTT) is connected. Removing any debugging code also reduces attack surface for a "production-grade" firmware. Instead of defining one feature per debugging part, I see another possible trade-off to configure conditional code:
|
|
@gendx can you report on how much this increases code size? I agree that the code size overhead seems small. Having a mechanism that will allow us to cut out debug output (for production/deployment) is increasingly important, but that's shouldn't block this. That being said, I agree with Pat that allowing this to be a runtime configuration - i.e., turn it on and off dynamically -- is important. What would code with runtime switching look like? |
|
As an experiment, I replaced the feature by a Here are the numbers with
|
|
So overall, I don't think features are needed for this kind of configuration. A Regarding having this as a dynamic configuration, I'm not sure how you would want that to work, i.e. what should toggle on/off the stracing? If it's an application via some syscall, as you mentioned above, this requires defining a suitable syscall API, and it will be important to define a threat model. There could be something heavier that sends back stracing events to an application (although can an app strace itself??), but again that's much more work. For now, I mostly see this stracing as a manual debugging tool from the kernel, e.g. to help implementing things like a USB stack, and a If you want to go ahead with the |
|
Quick reaction: I think this looks fantastic. I agree that there's a bunch of other pieces that need to come together before dynamic configuration would make sense, but I like that this design is still quite amenable to eventually moving that direction if/when we were ready (at the obvious cost of permanent code-size growth). |
|
I don't think we need dynamic tracing right now. My thought is that this could be a significant overhead, so something you don't want always on. But as Pat pointed out, you might want to be able to turn it on in the field in order to see what's happening. We don't have such as use case right now, so we don't need to be able to do it. But it's clear architecturally how we would if such a need comes up. I like the const config approach very much. Is the PR comment/text still correct? |
I updated the PR text and added some documentation. |
|
Hum, weird that Travis complains about Is there anything broken with the pinning of a specific toolchain? |
Ok, looks like Travis runs the tests on top of some automatic but hidden merge commit. I could reproduce and fix the bug with a manual merge commit. |
|
@alevy Does that look good to you? |
|
While I'm intrigued by the idea of a config module, this has the same net effect as a feature, doesn't it? |
Features can be more powerful in that they allow to conditionally compile arbitrary syntax. For example, one could define some type But I think that whenever possible, a |
I'm not sure whether strace or ptrace would be the most appropriate, so a more explicit name may be clearer for the reader.
|
bors r+ |
Done. |
Consistent with rest of kernel crate.
Avoid using the internal `idx()` function. This may not be the correct way to reference a process in the future. Plus we have formatting defined for AppId already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than leave comments I just added commits. See what you think.
- Added link in doc readme to the new file.
- Print
{:?}, appidrather than use the .idx() function. This is looking ahead as it will be better to think of AppId as an opaque type rather than just a holder of a number. The display trait is already implemented for it. - Change
pub(crate)tocrateto match the rest of the kernel. - And change the print format a bit. I think it is important to have
:#xas otherwise it is a pain to know what value something like "56" is. But I also changed some numbers to decimal as that is how they are specified in the Tock source (in particular driver numbers and subnumbers) and I think it is easier to not make readers have to do the conversion.
46d3977
You mention the
Today I learned. Did this syntax appear in a recent Rust?
+1
Updated the code again on my side. |
Also, note that in |
|
bors r+ |
|
Hum, don't know why bors canceled this merge. |
Pull Request Overview
This pull request adds an "strace" configuration to the kernel, which causes a trace of all syscalls and callbacks to be displayed on the debug port.
This also adds a new way of statically configure the kernel, via a
kernel/src/config.rsfile containing a static const configuration object. This allows to quickly turn on/off configuration values during development, while making sure that un-configured code is still checked for syntax and types, and allowing the compiler to optimize away dead code (due to the configuration beingconst).Testing Strategy
This pull request was tested by:
make ci-travis.TODO or Help Wanted
N/A
Documentation Updated
/docs, or no updates are required.Formatting
make formatall.