New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: consider using an atomic linked list for the callsite registry #860
Comments
@LucioFranco i imagine you might be interested in hacking on this? |
Yes! |
Great! Some notes for starting out on this:
|
Since 0.2 is coming up, we could potentially make changes to the callsite trait etc to support an intrusive, non-allocating list... |
The intrusive list would also be a major step in getting It would require adding a new method to the callsite API to get a next callsite pointer, and that would have to be a breaking change (since a default impl returning The other issue is that |
I'm imagining something like pub struct Registration {
callsite: &'static dyn Callsite + 'static),
next: AtomicPtr<Registration>,
}
impl Registration {
// == internal APIs ===
// called when traversing the registry (re-registering)
fn next(&self) -> Option<&Registration> { todo!() }
// called when registering
fn set_next(&self, next:: &'static Self) { todo!() }
// called by callsite implementations
pub const fn new(&'static (dyn Callsite + 'static) -> Self { todo!() }
}
pub trait Callsite {
// implemented by `Callsite` impls, basically they have to store a registration in a static
fn registration(&'static self) -> &'static Registration;
// ... other trait methods ...
} |
This change adds an intrusive `LinkedList` for the callsite registry. The linked list is lock-free and can be written to from multiple threads. This list also does not require any allocations due to its intrusive nature. This though does require a breaking change to the `Callsite` trait to allow callsites to store the pointer to the next item in the intrusive list. Properoties of the intrusive atomically linked-list: - Only supports `LinkedList::push` and `LinkedList::for_each`. - The items in the list can only be added and not removed. Closes #860
This change adds an intrusive `LinkedList` for the callsite registry. The linked list is lock-free and can be written to from multiple threads. This list also does not require any allocations due to its intrusive nature. This though does require a breaking change to the `Callsite` trait to allow callsites to store the pointer to the next item in the intrusive list. Properoties of the intrusive atomically linked-list: - Only supports `LinkedList::push` and `LinkedList::for_each`. - The items in the list can only be added and not removed. Closes #860
This change adds an intrusive `LinkedList` for the callsite registry. The linked list is lock-free and can be written to from multiple threads. This list also does not require any allocations due to its intrusive nature. This though does require a breaking change to the `Callsite` trait to allow callsites to store the pointer to the next item in the intrusive list. Properoties of the intrusive atomically linked-list: - Only supports `LinkedList::push` and `LinkedList::for_each`. - The items in the list can only be added and not removed. Closes #860
This change adds an intrusive `LinkedList` for the callsite registry. The linked list is lock-free and can be written to from multiple threads. This list also does not require any allocations due to its intrusive nature. This though does require a breaking change to the `Callsite` trait to allow callsites to store the pointer to the next item in the intrusive list. Properoties of the intrusive atomically linked-list: - Only supports `LinkedList::push` and `LinkedList::for_each`. - The items in the list can only be added and not removed. Closes #860
This change adds an intrusive `LinkedList` for the callsite registry. The linked list is lock-free and can be written to from multiple threads. This list also does not require any allocations due to its intrusive nature. This though does require a breaking change to the `Callsite` trait to allow callsites to store the pointer to the next item in the intrusive list. Properoties of the intrusive atomically linked-list: - Only supports `LinkedList::push` and `LinkedList::for_each`. - The items in the list can only be added and not removed. Closes #860
This change adds an intrusive `LinkedList` for the callsite registry. The linked list is lock-free and can be written to from multiple threads. This list also does not require any allocations due to its intrusive nature. This though does require a breaking change to the `Callsite` trait to allow callsites to store the pointer to the next item in the intrusive list. Properoties of the intrusive atomically linked-list: - Only supports `LinkedList::push` and `LinkedList::for_each`. - The items in the list can only be added and not removed. Closes #860
Feature Request
Crates
tracing-core
Motivation
tracing-core
's callsite registry stores every active dispatcher, and a&'static
reference to every callsite, so that the callsite can be re-registered when theInterest
cache is invalidated.Currently, this is represented as a pair of
Vec
s inside aMutex
:tracing/tracing-core/src/callsite.rs
Lines 16 to 19 in aa5c523
The callsite registry is accessed in a few ways:
Dispatch
is added or removed from the registry. This happens whenever aDispatch
is created or dropped, which is infrequent — if a program only uses the global default dispatcher, aDispatch
is created once in the process' lifetime, and is never dropped. This will iterate over every registered callsite to register it with the newDispatch
.Of these, adding new callsites to the registry is the hottest path. Although it is not terribly frequent, it still happens several orders of magnitude than iterating over the registered callsites. Because the callsite registry is behind a
Mutex
, every registration is forced to synchronize. This means that if multiple threads hit different callsites for the first time, they must all contend the lock.In most programs, registrations will happen more frequently at the beginning of the process' lifetime, and decrease over time as more and more callsites are hit for the first time. In long-running programs, the impact of contending the registry lock is highly amortized over the programs lifetime, and has a fairly small performance impact. However, in shorter-running programs, like CLI tools, it may be more meaningful. Also, even in longer-running programs, decreased performance at startup may be an issue.
Proposal
We may want to consider replacing the
Mutex<Vec<...>>
representation for the callsite registry with an atomic singly-linked list. With the current implementation, iterating over the registry is fast (because it's aVec
) but inserting must always contend the lock. A linked list is slower to iterate, but there is noMutex
in the insertion path. This probably makes more sense in this use case, since (as we discussed above) adding new callsites to the registry occurs much more frequently than iterating over currently registered callsites.Since we only need to iterate in one direction, and never remove from the middle of the list (or from anywhere in the list, in point of fact), a singly-linked list is fine. This means that we can use an atomic linked list without too much pain. We don't even need to worry about the ABA problem, because (as I mentioned) callsites are never removed, and have unique addresses. We would treat the list somewhat like a stack (except without pops), storing the pointer to the most recently appended callsite, and use this as the head for both appends and iterations. This means that we don't have to link-hop in appends.
If we want to be fancy, we could even make this an intrusive list, by adding a next pointer to the callsites themselves. That way, the static callsite would store the next pointer's address, rather than putting it on the heap. Then, we wouldn't need to make any allocations. I don't think the performance impact of allocations in this code path is terribly significant — getting rid of the lock is probably good enough for the callsite registration path. However, it does show up as a "leak" in tools like
valgrind
, since any allocations we make to register a callsite are never freed, and this can look scary to users who aren't aware of what this code is doing. It could be nice for this to not show up in valgrind. A side benefit is that this would also movetracing-core
one step closer to compiling onno_std
withoutliballoc
. I don't know if there's really any current demand for this (and it would take some additional work to pull off, such as making the global dispatcher work withoutArc
) but it would be cool to be able to say we can do it.Obviously, this change should be guided by benchmarking. It's possible my intuition is just wrong, and the
Mutex
is faster. Of course, we would want to benchmark this under concurrency, since that's when theMutex
contention might actually matter.Alternatives
#[link_section]
attribute to get the linker to put a reference to each callsite in a special section, which we would then treat as an array at runtime. This is similar to what @mystor did here: https://github.com/hawkw/mycelium/blob/03ef7cdefb2563d04c2c22219d24da32e8db387f/util/src/testing.rs#L38-L69.#[link_section]
is not specific to a particular linker likelld
or GNUld
, but I don't know if this would work at all on exotic targets like WASM, or if someone tries to test code usingtracing
in Miri. We could try to guard the whole system behind conditional compilation flags, and fall back to runtime registration if we can't guarantee that Cool Linker Tricks will work on the target platform...but that is a whole new can of worms in and of itself...The text was updated successfully, but these errors were encountered: