Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize zap.Any to allocate less memory on stack #1303

Closed
wants to merge 1 commit into from

Commits on Aug 1, 2023

  1. Optimize zap.Any to allocate less mamory on stack

    This is an alternative to #1301 and #1302. It's not as fast as these two
    options, but it still gives us half the stack reduction without the
    `unsafe` usage.
    
    Interestingly it seems that on both arm64 and amd64 the new code, with
    the closure, is faster than the plain old switch.
    We do see a ~5-10ns delay on `Any` creation if it's used without
    `logger`, but that's minimal and not realistic.
    
    Bunch of credit for this goes to @cdvr1993, we started independently,
    I was about to give up but the conversations pushed me forward. In the
    end he ended up going into a more advanced land where I dare not to enter.
    
    Longer version:
    
    We have identified an issue where zap.Any can cause performance
    degradation due to stack size.
    
    This is apparently cased by the compiler assigning 4.8kb (a zap.Field
    per arm of the switch statement) for zap.Any on stack. This can result
    in an unnecessary runtime.newstack+runtime.copystack.
    A github issue against Go language is pending.
    
    This can be particularly bad if `zap.Any` was to be used in a new
    goroutine, since the default goroutine sizes can be as low as 2kb (it can
    vary depending on average stack size - see golang/go#18138).
    
    This is an alternative to #1301, @cdvr and me were talking about this,
    and he inspired this idea with the closure.
    
    By using a function and a closure we're able to reduce the size and
    remove the degradation.
    At least on my laptop, this change result in a new performance gain,
    as all benchmarks show reduced time.
    
    10 runs.
    ```
    ❯ benchstat ~/before2.txt ~/after2.txt
    goos: darwin
    goarch: arm64
    pkg: go.uber.org/zap
                                │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                                │          sec/op          │    sec/op     vs base                │
    Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
    Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
    Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
    Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
    Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
    Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
    Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
    Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
    geomean                                   167.8n         160.7n         -4.23%
    
                                │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                                │           B/op           │    B/op     vs base                 │
    Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
    Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
    Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
    geomean                                              ²               +0.00%                ²
    ¹ all samples are equal
    ² summaries must be >0 to compute geomean
    
                                │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                                │        allocs/op         │ allocs/op   vs base                 │
    Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
    Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
    geomean
    ```
    
    On absolute terms:
    
    Before, on arm64:
    ```
    goos: darwin
    goarch: arm64
    pkg: go.uber.org/zap
    BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
    PASS
    ok      go.uber.org/zap 50.238s
    ```
    
    After:
    ```
    goos: darwin
    goarch: arm64
    pkg: go.uber.org/zap
    BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
    PASS
    ok      go.uber.org/zap 53.238s
    ```
    
    And amd64, before:
    ```
     % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
    goos: linux
    goarch: amd64
    pkg: go.uber.org/zap
    cpu: AMD EPYC 7B13
    BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
    PASS
    ok      go.uber.org/zap 51.671s
    ```
    
    After:
    ```
    goos: linux
    goarch: amd64
    pkg: go.uber.org/zap
    cpu: AMD EPYC 7B13
    BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
    BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
    BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
    BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
    BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
    PASS
    ```
    rabbbit committed Aug 1, 2023
    Configuration menu
    Copy the full SHA
    11b6f7b View commit details
    Browse the repository at this point in the history