Optimize AddCaller+AddStacktrace by sharing stacks #1052
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
If we use the `AddCaller(true)` option for a logger with `AddStacktrace(level)`, which captures a stack trace for the specified level or higher, Zap currently captures inspects the stack twice: - `getCallerFrame` retrieves information about the immediate caller of the log entry - `StackSkip` calls `takeStacktrace` to capture the entire stack trace and build a string from it For the case where both caller and stack trace are requested, we can avoid redundant work by sharing information about the call stack between the logic for the two. To accomplish this, the following high-level pieces were added: type stacktrace captureStacktrace(skip int, depth stacktraceDepth) *stacktrace func (*stacktrace) Next() (runtime.Frame, bool) type stackFormatter func newStackFormatter(*buffer.Buffer) func (*stackFormatter) FormatStack(*stacktrace) func (*stackFormatter) FormatFrame(runtime.Frame) `captureStacktrace` captures stack traces (either just one frame or the entire stack) that can then be inspected manually (to extract caller information) and formatted into a string representation either one-frame-at-a-time or wholesale with `stackFormatter`. This allows us reorganize logic that applies the AddCaller and AddStacktrace options so that it can: - capture as much of the stack as it needs - extract caller information from the first frame if we need it - format that frame and the remaining frames into a stack trace --- Benchmarking: I ran the included benchmark before and after the patch with the following flags and compared the results with benchstat. ``` % go test -run '^$' -bench AddCallerAndStack -benchmem -count 10 [...] % benchstat before.txt after.txt name old time/op new time/op delta AddCallerAndStacktrace-2 4.68µs ± 7% 3.63µs ± 7% -22.35% (p=0.000 n=10+10) name old alloc/op new alloc/op delta AddCallerAndStacktrace-2 632B ± 0% 416B ± 0% -34.18% (p=0.000 n=10+10) name old allocs/op new allocs/op delta AddCallerAndStacktrace-2 5.00 ± 0% 3.00 ± 0% -40.00% (p=0.000 n=10+10) ``` Allocations for this code path are down from 5 to 2, and CPU down by ~20%. To check for regressions, I also ran all existing benchmarks with "Caller" or "Stack" in their names: ``` % go test -run '^$' -bench 'Caller|Stack' -benchmem -count 10 [...] % benchstat before.txt after.txt name old time/op new time/op delta StackField-2 3.28µs ± 2% 3.49µs ± 2% +6.38% (p=0.000 n=9+9) AddCallerHook-2 1.89µs ± 2% 1.92µs ± 3% ~ (p=0.055 n=10+9) TakeStacktrace-2 3.17µs ± 1% 3.60µs ± 8% +13.63% (p=0.000 n=8+9) name old alloc/op new alloc/op delta StackField-2 528B ± 0% 528B ± 0% ~ (all equal) AddCallerHook-2 248B ± 0% 240B ± 0% -3.23% (p=0.000 n=10+10) TakeStacktrace-2 416B ± 0% 416B ± 0% ~ (all equal) name old allocs/op new allocs/op delta StackField-2 3.00 ± 0% 3.00 ± 0% ~ (all equal) AddCallerHook-2 3.00 ± 0% 2.00 ± 0% -33.33% (p=0.000 n=10+10) TakeStacktrace-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) ``` AddCaller costs one less allocation, but the cost of capturing a stack trace is *slightly* higher. I think that may be worth it for the 20%+ improvement on Caller+Stack and reduced allocations.
@@ Coverage Diff @@ ## master #1052 +/- ## ========================================== + Coverage 98.22% 98.29% +0.07% ========================================== Files 48 48 Lines 2080 2111 +31 ========================================== + Hits 2043 2075 +32 + Misses 29 28 -1 Partials 8 8
The `runtime.CallersFrames` method doesn't do much other than returning a struct pointer wrapping the passed in `pcs`. Calling this method twice ends up simplifying some of the logic (not having to format the first stack separately from the remaining), and ends up brining the performance of `TakeStacktrace` similar to `master`. ``` # benchstat comparing 3 runs of master vs this branch, each run using benchtime=10s name old time/op new time/op delta TakeStacktrace 1.47µs ± 2% 1.48µs ± 2% ~ (p=1.000 n=3+3) name old alloc/op new alloc/op delta TakeStacktrace 496B ± 0% 496B ± 0% ~ (all equal) name old allocs/op new allocs/op delta TakeStacktrace 2.00 ± 0% 2.00 ± 0% ~ (all equal) ```
And on my dev server with
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge.