Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Any stack usage #1301

Closed
wants to merge 1 commit into from

Conversation

cdvr1993
Copy link
Contributor

@cdvr1993 cdvr1993 commented Jul 24, 2023

By doing this unsafe approach we are able to save one call to morestack.

The benchmark results are as follows:

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/normal             19687398                61.96 ns/op
BenchmarkAny/normal-2           19890138                58.83 ns/op
BenchmarkAny/normal-4           20615840                58.00 ns/op
BenchmarkAny/normal-8           19891179                60.70 ns/op
BenchmarkAny/optimized          12379216                98.05 ns/op
BenchmarkAny/optimized-2        13090182                90.70 ns/op
BenchmarkAny/optimized-4        13424733                90.75 ns/op
BenchmarkAny/optimized-8        12764323                94.82 ns/op
BenchmarkAny/normal_with_logger                  7346602               154.2 ns/op
BenchmarkAny/normal_with_logger-2                8581615               145.1 ns/op
BenchmarkAny/normal_with_logger-4                8807738               137.8 ns/op
BenchmarkAny/normal_with_logger-8                8484144               146.7 ns/op
BenchmarkAny/optimized_with_logger               7519690               158.1 ns/op
BenchmarkAny/optimized_with_logger-2             7529078               154.9 ns/op
BenchmarkAny/optimized_with_logger-4             8006461               153.5 ns/op
BenchmarkAny/optimized_with_logger-8             7665606               150.2 ns/op
BenchmarkAny/normal_new_goroutine                 275497              3982 ns/op
BenchmarkAny/normal_new_goroutine-2               564081              2177 ns/op
BenchmarkAny/normal_new_goroutine-4               973549              1107 ns/op
BenchmarkAny/normal_new_goroutine-8              1856415               671.3 ns/op
BenchmarkAny/optimized_new_goroutine              550318              2073 ns/op
BenchmarkAny/optimized_new_goroutine-2           1000000              1275 ns/op
BenchmarkAny/optimized_new_goroutine-4           1690797               704.9 ns/op
BenchmarkAny/optimized_new_goroutine-8           1815640               642.1 ns/op
PASS
ok      go.uber.org/zap 34.181s
  • The first set of benchmark where we only use runtime.KeepAlive are slower.
  • The second set when we use logger the cost is the same.
  • The third set is 1-2x faster because of stack growth.

The assembly shows the following change in stack usage:

# from
SUBQ $0x12f8, SP   // 4856 bytes

# to
SUBQ $0x3d8, SP  // 984 bytes

So this is ~5x reduction in stack usage.

Additionally I did a cpu profile, and there I found way more usage from the old approach on high core core (32 cores):

Screenshot 2023-07-24 at 2 58 41 PM
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/normal_new_goroutine-32             4000000               664.5 ns/op
BenchmarkAny/optimized_new_goroutine-32          4000000               645.2 ns/op
PASS
ok      go.uber.org/zap 5.418s

Not sure why the time is the same but the profile isn't, only guess is that the goroutine scheduler is adding that latency?

Alternatives

Just fyi. I tried storing each switch statement in a function variable to avoid the compiler to allocate space for each function call, but by doing that the escape analyzer decides that the variables need to escape to the heap and we got the following results:

goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/normal             19453904                64.28 ns/op
BenchmarkAny/normal-2           19794348                60.95 ns/op
BenchmarkAny/normal-4           20221657                60.11 ns/op
BenchmarkAny/normal-8           19663383                61.51 ns/op
BenchmarkAny/optimized           6851958               176.9 ns/op
BenchmarkAny/optimized-2         7244982               165.2 ns/op
BenchmarkAny/optimized-4         7224718               160.4 ns/op
BenchmarkAny/optimized-8         6786295               171.4 ns/op
BenchmarkAny/normal_with_logger                  7592905               159.1 ns/op
BenchmarkAny/normal_with_logger-2                8350288               148.6 ns/op
BenchmarkAny/normal_with_logger-4                7812310               152.3 ns/op
BenchmarkAny/normal_with_logger-8                7676191               150.7 ns/op
BenchmarkAny/optimized_with_logger               5043325               239.7 ns/op
BenchmarkAny/optimized_with_logger-2             5618706               226.6 ns/op
BenchmarkAny/optimized_with_logger-4             5382121               215.1 ns/op
BenchmarkAny/optimized_with_logger-8             5612380               222.6 ns/op
BenchmarkAny/normal_new_goroutine                 262658              3889 ns/op
BenchmarkAny/normal_new_goroutine-2               649510              2127 ns/op
BenchmarkAny/normal_new_goroutine-4               869673              1229 ns/op
BenchmarkAny/normal_new_goroutine-8              1740733               699.0 ns/op
BenchmarkAny/optimized_new_goroutine              911757              1183 ns/op
BenchmarkAny/optimized_new_goroutine-2           1370961               929.5 ns/op
BenchmarkAny/optimized_new_goroutine-4           3625549               388.7 ns/op
BenchmarkAny/optimized_new_goroutine-8           2873199               389.7 ns/op
PASS
ok      go.uber.org/zap 37.317s

As you can see:

  • optimized is almost 3x slower (compared to 50% before).
  • optimized with logger is ~50% slower (compared to before when the cost was the same).
  • optimized on a new goroutine is 2-3x faster (compared to before where it was only 1-2x).

Stack usage became:

SUBQ $0x58, SP // 88 bytes

This would be a reduction of >50x but with the cost of 2 more heap allocations.

By doing this unsafe approach we are able to save one call to morestack.
rabbbit added a commit that referenced this pull request Jul 25, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

Before:
```
❯  go test -bench BenchmarkAny -benchmem -run Any

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAnyInGoroutine/any-12           3977013               316.1 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/int-12           4172178               289.6 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/goroutine-12     5018606               253.1 ns/op             0 B/op          0 allocs/op
BenchmarkAnyInGoroutine/int-in-go-12     2167634               561.5 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-12     1875784               637.6 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/int-in-go-with-stack-12                  2500544               446.7 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-with-stack-12                  1769258               653.0 ns/op            88 B/op          2 allocs/op
PASS
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run Any

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAnyInGoroutine/any-12           3967261               288.2 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/int-12           4571604               260.6 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/goroutine-12     5481904               216.4 ns/op             0 B/op          0 allocs/op
BenchmarkAnyInGoroutine/int-in-go-12     2097002               583.1 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-12     2164167               551.6 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/int-in-go-with-stack-12                  2989617               378.2 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-with-stack-12                  3123987               381.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 11.486s
```

10 runs.
```
❯ benchstat before.txt after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ before.txt  │              after.txt              │
                                       │   sec/op    │   sec/op     vs base                │
AnyInGoroutine/any-12                    305.2n ± 3%   290.0n ± 1%   -5.00% (p=0.000 n=10)
AnyInGoroutine/int-12                    288.0n ± 0%   265.5n ± 1%   -7.85% (p=0.000 n=10)
AnyInGoroutine/goroutine-12              218.3n ± 6%   216.9n ± 2%        ~ (p=0.469 n=10)
AnyInGoroutine/int-in-go-12              592.7n ± 2%   578.5n ± 0%   -2.40% (p=0.001 n=10)
AnyInGoroutine/any-in-go-12              666.5n ± 4%   557.8n ± 1%  -16.30% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12   474.4n ± 4%   470.3n ± 6%        ~ (p=0.631 n=10)
AnyInGoroutine/any-in-go-with-stack-12   617.6n ± 4%   475.9n ± 3%  -22.95% (p=0.000 n=10)
geomean                                  417.8n        382.8n        -8.36%

                                       │  before.txt  │              after.txt              │
                                       │     B/op     │    B/op     vs base                 │
AnyInGoroutine/any-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/goroutine-12              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-with-stack-12   88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/any-in-go-with-stack-12   88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                             ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │  before.txt  │              after.txt              │
                                       │  allocs/op   │ allocs/op   vs base                 │
AnyInGoroutine/any-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/goroutine-12              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/any-in-go-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-with-stack-12   2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/any-in-go-with-stack-12   2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                             ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit that referenced this pull request Jul 25, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
@codecov
Copy link

codecov bot commented Jul 25, 2023

Codecov Report

Merging #1301 (347f847) into master (24b7977) will increase coverage by 0.07%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1301      +/-   ##
==========================================
+ Coverage   98.08%   98.15%   +0.07%     
==========================================
  Files          50       50              
  Lines        3242     3369     +127     
==========================================
+ Hits         3180     3307     +127     
  Misses         53       53              
  Partials        9        9              
Files Changed Coverage Δ
field.go 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@cdvr1993
Copy link
Contributor Author

I believe the tradeoff here is not worth. While this approach's improvements on new goroutine stack allocation scenario is certainly impressive, the more frequent case is when we already have a stable goroutine whose stack size isn't growing anymore and we're logging happily. (the normal_with_logger case)

Let's discuss options in https://github.com/uber-go/zap/pull/1301.

@sywhang Is it really true?

Imagine you have 2 endpoints, one uses zap.Any the other doesn't. So one has a 2KB the other one 8KB stack. Imagine we have 75/25 traffic (25% to zap.Any). Then the average stack size in this case is going to be 4KB. So 25% of the traffic would need to allocate more stack.

@sywhang
Copy link
Contributor

sywhang commented Jul 25, 2023

@cdvr1993 That's assuming each request is served by a fresh new goroutine though, right?

@cdvr1993
Copy link
Contributor Author

cdvr1993 commented Jul 25, 2023

@cdvr1993 That's assuming each request is served by a fresh new goroutine though, right?

Yes, the way some rpc frameworks work. For instance, gRPC:

https://github.com/grpc/grpc-go/blob/master/server.go#L996

func (s *Server) serveStreams(st transport.ServerTransport) {
	defer st.Close(errors.New("finished serving streams for the server transport"))
	var wg sync.WaitGroup

	st.HandleStreams(func(stream *transport.Stream) {
		wg.Add(1)
		if s.opts.numServerWorkers > 0 {
			data := &serverWorkerData{st: st, wg: &wg, stream: stream}
			select {
			case s.serverWorkerChannel <- data:
				return
			default:
				// If all stream workers are busy, fallback to the default code path.
			}
		}
		go func() {
			defer wg.Done()
			s.handleStream(st, stream, s.traceInfo(st, stream))
		}()
	}, func(ctx context.Context, method string) context.Context {
		if !EnableTracing {
			return ctx
		}
		tr := trace.New("grpc.Recv."+methodFamily(method), method)
		return trace.NewContext(ctx, tr)
	})
	wg.Wait()
}

rabbbit added a commit that referenced this pull request Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit that referenced this pull request Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit that referenced this pull request Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit that referenced this pull request Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```
@abhinav
Copy link
Collaborator

abhinav commented Jul 26, 2023

I know that @rabbbit's version is still WIP, but looking at b8b64dc, if that works, I think maybe that version may be more digestible. I'm not a fan of adding this much unsafe and //go:* by hand. If we end up needing to go to the unsafe route, we can discuss options around maintainability.

@cdvr1993
Copy link
Contributor Author

I cleaned up a little bit the usage of go directives and unsafe usage:

cdvr1993@9b0f077

Stack is now 280 bytes vs 4856 bytes that we initially had.

Unsafe usage only remains for the interface parameter because we need a common type for it.

rabbbit added a commit that referenced this pull request Jul 27, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```
rabbbit added a commit that referenced this pull request Jul 27, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
@prashantv
Copy link
Collaborator

prashantv commented Jul 27, 2023

I know that @rabbbit's version is still WIP, but looking at b8b64dc, if that works, I think maybe that version may be more digestible. I'm not a fan of adding this much unsafe and //go:* by hand. If we end up needing to go to the unsafe route, we can discuss options around maintainability.

Big +1, this is a huge amount of unsafe. If we end up needing this sort of unsafe usage for performance reasons, I don't think it should be in the default build, it should be under a separate build tag like fast_unsafe or something, so users can opt-in to the faster but unsafe version, while having the standard build use safe constructs.

@cdvr1993
Copy link
Contributor Author

@prashantv @abhinav what do you think of creating a second Any function, like AnyUnsafe(), that way it is up to the users to use the unsafe version.

@abhinav
Copy link
Collaborator

abhinav commented Jul 27, 2023

First, let me say @cdvr1993, this is both, impressive debugging and nice workaround. I think we neglected to mention that in the initial reaction to the unsafe code.

The idea of "UnsafeAny" is more digestible, if a bit distasteful. I would still prefer to avoid it.

This is all a lot of debt to work around an (AFAIK) language issue. Please understand that part of the reason for my push back is that besides just the inherent risk of unsafe code, this makes it risky and difficult for anyone else to touch this code. The level of scrutiny necessary for every change here will have to be a lot higher, which is additional burden on maintainers.

If the unsafe option was the only way (and that's not yet settled), there are some mitigations that, in my opinion, would make the unsafe code more digestible:

  • To address readability and maintainability, it would be preferable if the function and the accompanying helpers were all code-generated from a simpler, more readable source-of-truth. (What form that source of truth takes is up for discussion.)
  • I think it would still be preferable to use build tags (//go:build !unsafe_zap and //go:build unsafe_zap) to select between the safe and unsafe implementations. My preference is unsafe-opt-in, but that's up for discussion if others feel differently.

Again, this is just my opinion. Other maintainers, feel free to chime in.
I don't recommend making the changes I just suggested until there's some agreement here, which I suspect is still contingent on the results of #1303 which avoids unsafe code entirely.

@cdvr1993
Copy link
Contributor Author

Ok, just for the sake of showing how this can be cleaned up a bit, I created: #1304

rabbbit added a commit that referenced this pull request Jul 28, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
rabbbit added a commit that referenced this pull request Jul 28, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
rabbbit added a commit that referenced this pull request Jul 28, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit that referenced this pull request Jul 28, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit that referenced this pull request Jul 28, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit that referenced this pull request Jul 29, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
prashantv added a commit that referenced this pull request Jul 29, 2023
Alternative to #1301 and #1303, prebuild a map by type
to determine which function to call.

This only works for concrete types, so we also need a list for
interfaces, ordered in precedence order (since multiple interfaces
may match a single type).
prashantv added a commit that referenced this pull request Jul 29, 2023
Alternative to #1301 and #1303, prebuild a map by type
to determine which function to call.

This only works for concrete types, so we also need a list for
interfaces, ordered in precedence order (since multiple interfaces
may match a single type).
rabbbit added a commit that referenced this pull request Jul 30, 2023
…ion)

This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.
- #1305 - my own PR that inline bunch of loops
- https://github.com/uber-go/zap/compare/pawel/any-int5 that does the
  same as above, but is slightly easier to parse
- #1307 - a reflect.TypeOf lookup version

THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to
get reviewers thoughts before typing everything in.

I originally thought we can use a `type props strucy` intermediary
struct to store the data, but that hits the same problem: every `props`
would get it's own slot on the stack. This avoids this by returning
the raw data.

Pros:
- the implementation is shared between `Any` and strongly typed Fields
- no reflect or unsafe
- reduced the stack significantly - we should be able to get to the same
  ~180 bytes as ~1305.
- no peft penalty for strongly typed versions, at least on ARM64 it's
  compiled away.

Cons:
- the code gets a bit harder to maintain. It's significantly better than
  #1305 I would say though.
rabbbit added a commit that referenced this pull request Jul 30, 2023
…ion)

This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.
- #1305 - my own PR that inline bunch of loops
- https://github.com/uber-go/zap/compare/pawel/any-int5 that does the
  same as above, but is slightly easier to parse
- #1307 - a reflect.TypeOf lookup version

THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to
get reviewers thoughts before typing everything in.

I originally thought we can use a `type props strucy` intermediary
struct to store the data, but that hits the same problem: every `props`
would get it's own slot on the stack. This avoids this by returning
the raw data.

Pros:
- the implementation is shared between `Any` and strongly typed Fields
- no reflect or unsafe
- reduced the stack significantly - we should be able to get to the same
  ~180 bytes as ~1305.
- no peft penalty for strongly typed versions, at least on ARM64 it's
  compiled away.

Cons:
- the code gets a bit harder to maintain. It's significantly better than
  #1305 I would say though.
abhinav added a commit to abhinav/zap that referenced this pull request Jul 30, 2023
Yet another attempt at reducing the stack size of zap.Any,
borrowing from uber-go#1301, uber-go#1303, uber-go#1304, uber-go#1305, uber-go#1307, and 1308.

This approach defines a generic data type for field constructors
of a specific type. This is similar to the lookup map in uber-go#1307,
minus the map lookup, the interface match, or reflection.

    type anyFieldC[T any] func(string, T) Field

The generic data type provides a non-generic method
matching the interface:

    interface{ Any(string, any) Field }

Stack size:
The stack size of zap.Any following this change is 0xc0.

    % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
    go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0

This is just 8 bytes more than uber-go#1305,
which is the smallest stack size of all other attempts.

Allocations:
Everything appears to get inlined with no heap escapes:

    go build -gcflags -m 2>&1 |
      grep field.go |
      perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' |
      grep 'escapes'

(Line 413 declares anyFieldC)

Besides that, the output of `-m` for the relevant section of code
consists of almost entirely:

    ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any
    [...]
    ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any

Followed by:

    ./field.go:428:10: leaking param: key
    ./field.go:428:22: leaking param: value

Maintainability:
Unlike some of the other approaches, this variant is more maintainable.
The `zap.Any` function looks roughly the same.
Adding new branches there is obvious, and requires no duplication.

Performance:
This is a net improvement on all BenchmarkAny calls
except "any-no-logger" which calls `zap.Any` and discards the result.

```
name                        old time/op    new time/op    delta
Any/str-no-logger-2           8.77ns ± 0%    8.75ns ± 1%     ~     (p=0.159 n=4+5)
Any/any-no-logger-2           54.1ns ± 0%    81.6ns ± 0%  +50.71%  (p=0.016 n=5+4)
Any/str-with-logger-2         1.38µs ± 3%    1.38µs ± 4%     ~     (p=0.841 n=5+5)
Any/any-with-logger-2         1.60µs ±22%    1.37µs ± 1%     ~     (p=0.151 n=5+5)
Any/str-in-go-2               3.41µs ± 1%    3.42µs ± 5%     ~     (p=0.905 n=4+5)
Any/any-in-go-2               5.98µs ± 1%    3.68µs ± 6%  -38.44%  (p=0.008 n=5+5)
Any/str-in-go-with-stack-2    3.42µs ± 2%    3.46µs ± 3%     ~     (p=0.421 n=5+5)
Any/any-in-go-with-stack-2    5.98µs ± 3%    3.65µs ± 3%  -38.95%  (p=0.008 n=5+5)

name                        old alloc/op   new alloc/op   delta
Any/str-no-logger-2            0.00B          0.00B          ~     (all equal)
Any/any-no-logger-2            0.00B          0.00B          ~     (all equal)
Any/str-with-logger-2          64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/any-with-logger-2          64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/str-in-go-2                88.5B ± 1%     88.0B ± 0%     ~     (p=0.429 n=4+4)
Any/any-in-go-2                88.0B ± 0%     88.0B ± 0%     ~     (all equal)
Any/str-in-go-with-stack-2     88.0B ± 0%     88.0B ± 0%     ~     (all equal)
Any/any-in-go-with-stack-2     88.0B ± 0%     88.0B ± 0%     ~     (all equal)

name                        old allocs/op  new allocs/op  delta
Any/str-no-logger-2             0.00           0.00          ~     (all equal)
Any/any-no-logger-2             0.00           0.00          ~     (all equal)
Any/str-with-logger-2           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/any-with-logger-2           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/str-in-go-2                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/any-in-go-2                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/str-in-go-with-stack-2      2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/any-in-go-with-stack-2      2.00 ± 0%      2.00 ± 0%     ~     (all equal)
```

I believe this is acceptable because that's not a real use case;
we expect the result to be used with a logger.
rabbbit added a commit that referenced this pull request Jul 30, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310.

We're writing various approaches to reduce the stock size and it's
painful to keep copy-pasting the tests between PRs. This was suggested
in @prashantv in #1307.

The tests are mostly based on tests in #1303, but made "more generic",
as #1307 we might want to test across more than just a single type.
It does make the tests a bit harder to setup. Some of the setup is
inconvenient (we duplicate the value in both `typed` and `any` version
of the tests) but hopefully okay to understand. A fully non-duplicated
alternative would likely require something like #1310 itself.

For #1307 in particular a test against interface type would likely be
needed, so adding it here too.

The tests compare two code paths, with the same arguments, one using a
strongly typed method and second using `zap.Any`. We have:
- a simple "create field" case for a baseline
- a "create and log" case for a realistic case (we typically log the fields)
- a "create and log in a goroutine" case for the pathological case
  we're trying to solve for.
- a "create and long in goroutine in a pre-warmed system" that does the
  above, but before tries to affect the starting goroutine stack size
  to provide an realistic example.
  Without this, for any tests with 2+ goroutines, the cost of `zap.Any`
  is not visible, as we always end up expanding the stack even in the
  strongly typed methods.

The test results are:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/string-typ-no-logger               166879518                6.988 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string-typ-no-logger-12            167398297                6.973 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string-any-no-logger               87669631                13.97 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string-any-no-logger-12            86760837                14.11 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string-typ-logger                   3059485               395.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-typ-logger-12                3141176               379.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-any-logger                   2995699               401.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-any-logger-12                3071046               391.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-typ-logger-go                 784323              1351 ns/op             146 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-12             2000835               613.9 ns/op            96 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go                 477486              2479 ns/op             117 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-12             1830955               680.0 ns/op           112 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-stack           841566              1328 ns/op              96 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-stack-12       2625226               479.6 ns/op            96 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-stack           486084              2493 ns/op             112 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-stack-12       2658640               667.9 ns/op           112 B/op          2 allocs/op
BenchmarkAny/stringer-typ-no-logger             147314238                8.034 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer-typ-no-logger-12          157857937                7.436 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer-any-no-logger             58872349                20.19 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer-any-no-logger-12          60532305                20.27 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer-typ-logger                 3094204               411.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-typ-logger-12              3163489               383.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-any-logger                 2981940               427.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-any-logger-12              2777792               394.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-typ-logger-go               911761              1335 ns/op              96 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-12           2006440               605.2 ns/op            96 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go               467934              2518 ns/op             112 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-12           1786076               683.1 ns/op           112 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-stack         855794              1316 ns/op              96 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-stack-12     2598783               434.5 ns/op            96 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-stack         473282              2474 ns/op             112 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-stack-12     2020183               651.9 ns/op           112 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.516s
```
@rabbbit rabbbit mentioned this pull request Jul 30, 2023
prashantv pushed a commit that referenced this pull request Jul 31, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
prashantv added a commit that referenced this pull request Jul 31, 2023
Alternative to #1301 and #1303, prebuild a map by type
to determine which function to call.

This only works for concrete types, so we also need a list for
interfaces, ordered in precedence order (since multiple interfaces
may match a single type).
@cdvr1993 cdvr1993 closed this Jul 31, 2023
rabbbit added a commit that referenced this pull request Aug 1, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and
#1310.

We're writing various approaches to reduce the stock size and it's
painful to keep copy-pasting the tests between PRs. This was suggested
in @prashantv in #1307.

The tests are mostly based on tests in #1303, but made "more generic",
as #1307 we might want to test across more than just a single type. It
does make the tests a bit harder to setup. Some of the setup is
inconvenient (we duplicate the value in both `typed` and `any` version
of the tests) but hopefully okay to understand. A fully non-duplicated
alternative would likely require something like #1310 itself.

For #1307 in particular a test against interface type would likely be
needed, so adding it here too.

The tests compare two code paths, with the same arguments, one using a
strongly typed method and second using `zap.Any`. We have:
- a simple "create field" case for a baseline
- a "create and log" case for a realistic case (we typically log the
fields)
- a "create and log in a goroutine" case for the pathological case we're
trying to solve for.
- -a "create and long in goroutine in a pre-warmed system" that does the
above- we decided it's not worth the complication.

The test results are:
```
❯  go test -bench BenchmarkAny -benchmem -cpu 1
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/string/field-only/typed    161981473                7.374 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string/field-only/any      82343354                14.67 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string/log/typed            2965648               416.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string/log/any              2920292               418.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string/log-go/typed         1000000              1158 ns/op             112 B/op          3 allocs/op
BenchmarkAny/string/log-go/any            553144              2152 ns/op             128 B/op          3 allocs/op
BenchmarkAny/stringer/field-only/typed  160509367                7.548 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer/field-only/any    51330402                23.45 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer/log/typed          3221404               377.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer/log/any            2726443               393.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer/log-go/typed       1000000              1129 ns/op             112 B/op          3 allocs/op
BenchmarkAny/stringer/log-go/any          558602              2147 ns/op             128 B/op          3 allocs/op
PASS
ok      go.uber.org/zap 19.426s
```

On gotip:
```
❯  gotip test -bench BenchmarkAny -benchmem -cpu 1

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/string/field-only/typed    155084869                7.603 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string/field-only/any      82740788                14.55 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string/log/typed            2800495               411.6 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string/log/any              2896258               411.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string/log-go/typed         1000000              1155 ns/op             112 B/op          3 allocs/op
BenchmarkAny/string/log-go/any            551599              2168 ns/op             128 B/op          3 allocs/op
BenchmarkAny/stringer/field-only/typed  159505488                7.578 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer/field-only/any    51406354                23.78 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer/log/typed          3011210               388.6 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer/log/any            3010370               395.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer/log-go/typed       1000000              1161 ns/op             112 B/op          3 allocs/op
BenchmarkAny/stringer/log-go/any          553860              2161 ns/op             128 B/op          3 allocs/op
PASS
ok      go.uber.org/zap 19.391s
```

on amd64 (similar, 2x worse stack growth impact)
```
 % go test -bench BenchmarkAny -benchmem -cpu 1
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/string/field-only/typed    47534053                25.23 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string/field-only/any      36913526                32.57 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string/log/typed            1693508               725.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string/log/any              1576172               765.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string/log-go/typed          516832              2343 ns/op             112 B/op          3 allocs/op
BenchmarkAny/string/log-go/any            243692              4404 ns/op             128 B/op          3 allocs/op
BenchmarkAny/stringer/field-only/typed  48735537                24.73 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer/field-only/any    26115684                47.24 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer/log/typed          1761630               677.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer/log/any            1646913               705.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer/log-go/typed        534187              2275 ns/op             112 B/op          3 allocs/op
BenchmarkAny/stringer/log-go/any          273787              4348 ns/op             128 B/op          3 allocs/op
PASS
ok      go.uber.org/zap 18.890s
```
rabbbit added a commit that referenced this pull request Aug 1, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @cdvr and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
rabbbit added a commit that referenced this pull request Aug 1, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit that referenced this pull request Aug 1, 2023
…ion)

This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.
- #1305 - my own PR that inline bunch of loops
- https://github.com/uber-go/zap/compare/pawel/any-int5 that does the
  same as above, but is slightly easier to parse
- #1307 - a reflect.TypeOf lookup version

THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to
get reviewers thoughts before typing everything in.

I originally thought we can use a `type props strucy` intermediary
struct to store the data, but that hits the same problem: every `props`
would get it's own slot on the stack. This avoids this by returning
the raw data.

Pros:
- the implementation is shared between `Any` and strongly typed Fields
- no reflect or unsafe
- reduced the stack significantly - we should be able to get to the same
  ~180 bytes as ~1305.
- no peft penalty for strongly typed versions, at least on ARM64 it's
  compiled away.

Cons:
- the code gets a bit harder to maintain. It's significantly better than
  #1305 I would say though.
abhinav added a commit to abhinav/zap that referenced this pull request Aug 1, 2023
Yet another attempt at reducing the stack size of zap.Any,
borrowing from uber-go#1301, uber-go#1303, uber-go#1304, uber-go#1305, uber-go#1307, and 1308.

This approach defines a generic data type for field constructors
of a specific type. This is similar to the lookup map in uber-go#1307,
minus the map lookup, the interface match, or reflection.

    type anyFieldC[T any] func(string, T) Field

The generic data type provides a non-generic method
matching the interface:

    interface{ Any(string, any) Field }

Stack size:
The stack size of zap.Any following this change is 0xc0.

    % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
    go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0

This is just 8 bytes more than uber-go#1305,
which is the smallest stack size of all other attempts.

Allocations:
Everything appears to get inlined with no heap escapes:

    go build -gcflags -m 2>&1 |
      grep field.go |
      perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' |
      grep 'escapes'

(Line 413 declares anyFieldC)

Besides that, the output of `-m` for the relevant section of code
consists of almost entirely:

    ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any
    [...]
    ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any

Followed by:

    ./field.go:428:10: leaking param: key
    ./field.go:428:22: leaking param: value

Maintainability:
Unlike some of the other approaches, this variant is more maintainable.
The `zap.Any` function looks roughly the same.
Adding new branches there is obvious, and requires no duplication.

Performance:
This is a net improvement on all BenchmarkAny calls
except "any-no-logger" which calls `zap.Any` and discards the result.

```
name                        old time/op    new time/op    delta
Any/str-no-logger-2           8.77ns ± 0%    8.75ns ± 1%     ~     (p=0.159 n=4+5)
Any/any-no-logger-2           54.1ns ± 0%    81.6ns ± 0%  +50.71%  (p=0.016 n=5+4)
Any/str-with-logger-2         1.38µs ± 3%    1.38µs ± 4%     ~     (p=0.841 n=5+5)
Any/any-with-logger-2         1.60µs ±22%    1.37µs ± 1%     ~     (p=0.151 n=5+5)
Any/str-in-go-2               3.41µs ± 1%    3.42µs ± 5%     ~     (p=0.905 n=4+5)
Any/any-in-go-2               5.98µs ± 1%    3.68µs ± 6%  -38.44%  (p=0.008 n=5+5)
Any/str-in-go-with-stack-2    3.42µs ± 2%    3.46µs ± 3%     ~     (p=0.421 n=5+5)
Any/any-in-go-with-stack-2    5.98µs ± 3%    3.65µs ± 3%  -38.95%  (p=0.008 n=5+5)

name                        old alloc/op   new alloc/op   delta
Any/str-no-logger-2            0.00B          0.00B          ~     (all equal)
Any/any-no-logger-2            0.00B          0.00B          ~     (all equal)
Any/str-with-logger-2          64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/any-with-logger-2          64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/str-in-go-2                88.5B ± 1%     88.0B ± 0%     ~     (p=0.429 n=4+4)
Any/any-in-go-2                88.0B ± 0%     88.0B ± 0%     ~     (all equal)
Any/str-in-go-with-stack-2     88.0B ± 0%     88.0B ± 0%     ~     (all equal)
Any/any-in-go-with-stack-2     88.0B ± 0%     88.0B ± 0%     ~     (all equal)

name                        old allocs/op  new allocs/op  delta
Any/str-no-logger-2             0.00           0.00          ~     (all equal)
Any/any-no-logger-2             0.00           0.00          ~     (all equal)
Any/str-with-logger-2           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/any-with-logger-2           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/str-in-go-2                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/any-in-go-2                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/str-in-go-with-stack-2      2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/any-in-go-with-stack-2      2.00 ± 0%      2.00 ± 0%     ~     (all equal)
```

I believe this is acceptable because that's not a real use case;
we expect the result to be used with a logger.
abhinav added a commit to abhinav/zap that referenced this pull request Aug 1, 2023
Yet another attempt at reducing the stack size of zap.Any,
borrowing from uber-go#1301, uber-go#1303, uber-go#1304, uber-go#1305, uber-go#1307, and 1308.

This approach defines a generic data type for field constructors
of a specific type. This is similar to the lookup map in uber-go#1307,
minus the map lookup, the interface match, or reflection.

    type anyFieldC[T any] func(string, T) Field

The generic data type provides a non-generic method
matching the interface:

    interface{ Any(string, any) Field }

Stack size:
The stack size of zap.Any following this change is 0xc0.

    % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
    go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0

This is just 8 bytes more than uber-go#1305,
which is the smallest stack size of all other attempts.

Allocations:
Everything appears to get inlined with no heap escapes:

    go build -gcflags -m 2>&1 |
      grep field.go |
      perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' |
      grep 'escapes'

(Line 413 declares anyFieldC)

Besides that, the output of `-m` for the relevant section of code
consists of almost entirely:

    ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any
    [...]
    ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any

Followed by:

    ./field.go:428:10: leaking param: key
    ./field.go:428:22: leaking param: value

Maintainability:
Unlike some of the other approaches, this variant is more maintainable.
The `zap.Any` function looks roughly the same.
Adding new branches there is obvious, and requires no duplication.

Performance:
This is a net improvement on all BenchmarkAny calls
except "any-no-logger" which calls `zap.Any` and discards the result.

```
name                        old time/op    new time/op    delta
Any/str-no-logger-2           8.77ns ± 0%    8.75ns ± 1%     ~     (p=0.159 n=4+5)
Any/any-no-logger-2           54.1ns ± 0%    81.6ns ± 0%  +50.71%  (p=0.016 n=5+4)
Any/str-with-logger-2         1.38µs ± 3%    1.38µs ± 4%     ~     (p=0.841 n=5+5)
Any/any-with-logger-2         1.60µs ±22%    1.37µs ± 1%     ~     (p=0.151 n=5+5)
Any/str-in-go-2               3.41µs ± 1%    3.42µs ± 5%     ~     (p=0.905 n=4+5)
Any/any-in-go-2               5.98µs ± 1%    3.68µs ± 6%  -38.44%  (p=0.008 n=5+5)
Any/str-in-go-with-stack-2    3.42µs ± 2%    3.46µs ± 3%     ~     (p=0.421 n=5+5)
Any/any-in-go-with-stack-2    5.98µs ± 3%    3.65µs ± 3%  -38.95%  (p=0.008 n=5+5)

name                        old alloc/op   new alloc/op   delta
Any/str-no-logger-2            0.00B          0.00B          ~     (all equal)
Any/any-no-logger-2            0.00B          0.00B          ~     (all equal)
Any/str-with-logger-2          64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/any-with-logger-2          64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/str-in-go-2                88.5B ± 1%     88.0B ± 0%     ~     (p=0.429 n=4+4)
Any/any-in-go-2                88.0B ± 0%     88.0B ± 0%     ~     (all equal)
Any/str-in-go-with-stack-2     88.0B ± 0%     88.0B ± 0%     ~     (all equal)
Any/any-in-go-with-stack-2     88.0B ± 0%     88.0B ± 0%     ~     (all equal)

name                        old allocs/op  new allocs/op  delta
Any/str-no-logger-2             0.00           0.00          ~     (all equal)
Any/any-no-logger-2             0.00           0.00          ~     (all equal)
Any/str-with-logger-2           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/any-with-logger-2           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/str-in-go-2                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/any-in-go-2                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/str-in-go-with-stack-2      2.00 ± 0%      2.00 ± 0%     ~     (all equal)
Any/any-in-go-with-stack-2      2.00 ± 0%      2.00 ± 0%     ~     (all equal)
```

I believe this is acceptable because that's not a real use case;
we expect the result to be used with a logger.
abhinav added a commit that referenced this pull request Aug 1, 2023
Yet another attempt at reducing the stack size of zap.Any,
borrowing from #1301, #1303, #1304, #1305, #1307, and #1308.

This approach defines a generic data type for field constructors
of a specific type. This is similar to the lookup map in #1307,
minus the map lookup, the interface match, or reflection.

    type anyFieldC[T any] func(string, T) Field

The generic data type provides a non-generic method
matching the interface:

    interface{ Any(string, any) Field }

**Stack size**:
The stack size of zap.Any following this change is 0xc0 (192 bytes).

    % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
    go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0

This is just 8 bytes more than #1305,
which is the smallest stack size of all other attempts.

**Allocations**:
Everything appears to get inlined with no heap escapes:

    % go build -gcflags -m 2>&1 |
      grep field.go |
      perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' |
      grep 'escapes'
    [no output]

(Line 413 declares anyFieldC)

Besides that, the output of `-m` for the relevant section of code
consists of almost entirely:

    ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any
    [...]
    ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any

Followed by:

    ./field.go:428:10: leaking param: key
    ./field.go:428:22: leaking param: value

**Maintainability**:
Unlike some of the other approaches, this variant is more maintainable.
The `zap.Any` function looks roughly the same.
Adding new branches there is obvious, and requires no duplication.

**Performance**:
This is a net improvement against master on BenchmarkAny's
log-go checks that log inside a new goroutine.

```
name                           old time/op    new time/op    delta
Any/string/field-only/typed      25.2ns ± 1%    25.6ns ± 2%     ~     (p=0.460 n=5+5)
Any/string/field-only/any        56.9ns ± 3%    79.4ns ± 0%  +39.55%  (p=0.008 n=5+5)
Any/string/log/typed             1.47µs ± 0%    1.49µs ± 4%   +1.58%  (p=0.016 n=4+5)
Any/string/log/any               1.53µs ± 2%    1.55µs ± 1%   +1.37%  (p=0.016 n=5+5)
Any/string/log-go/typed          5.97µs ± 6%    5.99µs ± 1%     ~     (p=0.151 n=5+5)
Any/string/log-go/any            10.9µs ± 0%     6.2µs ± 0%  -43.32%  (p=0.008 n=5+5)
Any/stringer/field-only/typed    25.3ns ± 1%    25.5ns ± 1%   +1.09%  (p=0.008 n=5+5)
Any/stringer/field-only/any      85.5ns ± 1%   124.5ns ± 0%  +45.66%  (p=0.008 n=5+5)
Any/stringer/log/typed           1.43µs ± 1%    1.42µs ± 2%     ~     (p=0.175 n=4+5)
Any/stringer/log/any             1.50µs ± 1%    1.56µs ± 6%   +4.20%  (p=0.008 n=5+5)
Any/stringer/log-go/typed        5.94µs ± 0%    5.92µs ± 0%   -0.40%  (p=0.032 n=5+5)
Any/stringer/log-go/any          11.1µs ± 2%     6.3µs ± 0%  -42.93%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
Any/string/field-only/typed       0.00B          0.00B          ~     (all equal)
Any/string/field-only/any         0.00B          0.00B          ~     (all equal)
Any/string/log/typed              64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/string/log/any                64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/string/log-go/typed            112B ± 0%      112B ± 0%     ~     (all equal)
Any/string/log-go/any              128B ± 0%      128B ± 0%     ~     (all equal)
Any/stringer/field-only/typed     0.00B          0.00B          ~     (all equal)
Any/stringer/field-only/any       0.00B          0.00B          ~     (all equal)
Any/stringer/log/typed            64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/stringer/log/any              64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/stringer/log-go/typed          112B ± 0%      112B ± 0%     ~     (all equal)
Any/stringer/log-go/any            128B ± 0%      128B ± 0%     ~     (all equal)

name                           old allocs/op  new allocs/op  delta
Any/string/field-only/typed        0.00           0.00          ~     (all equal)
Any/string/field-only/any          0.00           0.00          ~     (all equal)
Any/string/log/typed               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/string/log/any                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/string/log-go/typed            3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/string/log-go/any              3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/stringer/field-only/typed      0.00           0.00          ~     (all equal)
Any/stringer/field-only/any        0.00           0.00          ~     (all equal)
Any/stringer/log/typed             1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/stringer/log/any               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/stringer/log-go/typed          3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/stringer/log-go/any            3.00 ± 0%      3.00 ± 0%     ~     (all equal)
```

It causes a regression in "field-only"
which calls the field constructor and discards the result
without using it in a logger.
I believe this is acceptable because that's not a real use case;
we expect the result to be used with a logger.
rabbbit added a commit that referenced this pull request Aug 2, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310.

We're writing various approaches to reduce the stack size and it's
painful to keep copy-pasting the tests between PRs. This was suggested
in @prashantv in #1307.

The tests are mostly based on tests in #1303, but made "more generic",
as #1307 we might want to test across more than just a single type.
It does make the tests a bit harder to setup. Some of the setup is
inconvenient (we duplicate the value in both `typed` and `any` version
of the tests) but hopefully okay to understand. A fully non-duplicated
alternative would likely require something like #1310 itself.

For #1307 in particular a test against interface type would likely be
needed, so adding it here too.

The tests compare two code paths, with the same arguments, one using a
strongly typed method and second using `zap.Any`. We have:
- a simple "create field" case for a baseline
- a "create and log" case for a realistic case (we typically log the fields)
- a "create and log in a goroutine" case for the pathological case
  we're trying to solve for.
- a "create and long in goroutine in a pre-warmed system" that does the
  above, but before tries to affect the starting goroutine stack size
  to provide an realistic example.
  Without this, for any tests with 2+ goroutines, the cost of `zap.Any`
  is not visible, as we always end up expanding the stack even in the
  strongly typed methods.

The test results are:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/string-typ-no-logger               166879518                6.988 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string-typ-no-logger-12            167398297                6.973 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string-any-no-logger               87669631                13.97 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string-any-no-logger-12            86760837                14.11 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string-typ-logger                   3059485               395.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-typ-logger-12                3141176               379.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-any-logger                   2995699               401.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-any-logger-12                3071046               391.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-typ-logger-go                 784323              1351 ns/op             146 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-12             2000835               613.9 ns/op            96 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go                 477486              2479 ns/op             117 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-12             1830955               680.0 ns/op           112 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-stack           841566              1328 ns/op              96 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-stack-12       2625226               479.6 ns/op            96 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-stack           486084              2493 ns/op             112 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-stack-12       2658640               667.9 ns/op           112 B/op          2 allocs/op
BenchmarkAny/stringer-typ-no-logger             147314238                8.034 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer-typ-no-logger-12          157857937                7.436 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer-any-no-logger             58872349                20.19 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer-any-no-logger-12          60532305                20.27 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer-typ-logger                 3094204               411.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-typ-logger-12              3163489               383.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-any-logger                 2981940               427.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-any-logger-12              2777792               394.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-typ-logger-go               911761              1335 ns/op              96 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-12           2006440               605.2 ns/op            96 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go               467934              2518 ns/op             112 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-12           1786076               683.1 ns/op           112 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-stack         855794              1316 ns/op              96 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-stack-12     2598783               434.5 ns/op            96 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-stack         473282              2474 ns/op             112 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-stack-12     2020183               651.9 ns/op           112 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.516s
```
rabbbit pushed a commit that referenced this pull request Aug 2, 2023
Yet another attempt at reducing the stack size of zap.Any,
borrowing from #1301, #1303, #1304, #1305, #1307, and #1308.

This approach defines a generic data type for field constructors
of a specific type. This is similar to the lookup map in #1307,
minus the map lookup, the interface match, or reflection.

    type anyFieldC[T any] func(string, T) Field

The generic data type provides a non-generic method
matching the interface:

    interface{ Any(string, any) Field }

**Stack size**:
The stack size of zap.Any following this change is 0xc0 (192 bytes).

    % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
    go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0

This is just 8 bytes more than #1305,
which is the smallest stack size of all other attempts.

**Allocations**:
Everything appears to get inlined with no heap escapes:

    % go build -gcflags -m 2>&1 |
      grep field.go |
      perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' |
      grep 'escapes'
    [no output]

(Line 413 declares anyFieldC)

Besides that, the output of `-m` for the relevant section of code
consists of almost entirely:

    ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any
    [...]
    ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any

Followed by:

    ./field.go:428:10: leaking param: key
    ./field.go:428:22: leaking param: value

**Maintainability**:
Unlike some of the other approaches, this variant is more maintainable.
The `zap.Any` function looks roughly the same.
Adding new branches there is obvious, and requires no duplication.

**Performance**:
This is a net improvement against master on BenchmarkAny's
log-go checks that log inside a new goroutine.

```
name                           old time/op    new time/op    delta
Any/string/field-only/typed      25.2ns ± 1%    25.6ns ± 2%     ~     (p=0.460 n=5+5)
Any/string/field-only/any        56.9ns ± 3%    79.4ns ± 0%  +39.55%  (p=0.008 n=5+5)
Any/string/log/typed             1.47µs ± 0%    1.49µs ± 4%   +1.58%  (p=0.016 n=4+5)
Any/string/log/any               1.53µs ± 2%    1.55µs ± 1%   +1.37%  (p=0.016 n=5+5)
Any/string/log-go/typed          5.97µs ± 6%    5.99µs ± 1%     ~     (p=0.151 n=5+5)
Any/string/log-go/any            10.9µs ± 0%     6.2µs ± 0%  -43.32%  (p=0.008 n=5+5)
Any/stringer/field-only/typed    25.3ns ± 1%    25.5ns ± 1%   +1.09%  (p=0.008 n=5+5)
Any/stringer/field-only/any      85.5ns ± 1%   124.5ns ± 0%  +45.66%  (p=0.008 n=5+5)
Any/stringer/log/typed           1.43µs ± 1%    1.42µs ± 2%     ~     (p=0.175 n=4+5)
Any/stringer/log/any             1.50µs ± 1%    1.56µs ± 6%   +4.20%  (p=0.008 n=5+5)
Any/stringer/log-go/typed        5.94µs ± 0%    5.92µs ± 0%   -0.40%  (p=0.032 n=5+5)
Any/stringer/log-go/any          11.1µs ± 2%     6.3µs ± 0%  -42.93%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
Any/string/field-only/typed       0.00B          0.00B          ~     (all equal)
Any/string/field-only/any         0.00B          0.00B          ~     (all equal)
Any/string/log/typed              64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/string/log/any                64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/string/log-go/typed            112B ± 0%      112B ± 0%     ~     (all equal)
Any/string/log-go/any              128B ± 0%      128B ± 0%     ~     (all equal)
Any/stringer/field-only/typed     0.00B          0.00B          ~     (all equal)
Any/stringer/field-only/any       0.00B          0.00B          ~     (all equal)
Any/stringer/log/typed            64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/stringer/log/any              64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/stringer/log-go/typed          112B ± 0%      112B ± 0%     ~     (all equal)
Any/stringer/log-go/any            128B ± 0%      128B ± 0%     ~     (all equal)

name                           old allocs/op  new allocs/op  delta
Any/string/field-only/typed        0.00           0.00          ~     (all equal)
Any/string/field-only/any          0.00           0.00          ~     (all equal)
Any/string/log/typed               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/string/log/any                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/string/log-go/typed            3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/string/log-go/any              3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/stringer/field-only/typed      0.00           0.00          ~     (all equal)
Any/stringer/field-only/any        0.00           0.00          ~     (all equal)
Any/stringer/log/typed             1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/stringer/log/any               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/stringer/log-go/typed          3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/stringer/log-go/any            3.00 ± 0%      3.00 ± 0%     ~     (all equal)
```

It causes a regression in "field-only"
which calls the field constructor and discards the result
without using it in a logger.
I believe this is acceptable because that's not a real use case;
we expect the result to be used with a logger.
sywhang pushed a commit that referenced this pull request Aug 2, 2023
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310.

We're writing various approaches to reduce the stock size and it's
painful to keep copy-pasting the tests between PRs. This was suggested
in @prashantv in #1307.

The tests are mostly based on tests in #1303, but made "more generic",
as #1307 we might want to test across more than just a single type.
It does make the tests a bit harder to setup. Some of the setup is
inconvenient (we duplicate the value in both `typed` and `any` version
of the tests) but hopefully okay to understand. A fully non-duplicated
alternative would likely require something like #1310 itself.

For #1307 in particular a test against interface type would likely be
needed, so adding it here too.

The tests compare two code paths, with the same arguments, one using a
strongly typed method and second using `zap.Any`. We have:
- a simple "create field" case for a baseline
- a "create and log" case for a realistic case (we typically log the fields)
- a "create and log in a goroutine" case for the pathological case
  we're trying to solve for.
- a "create and long in goroutine in a pre-warmed system" that does the
  above, but before tries to affect the starting goroutine stack size
  to provide an realistic example.
  Without this, for any tests with 2+ goroutines, the cost of `zap.Any`
  is not visible, as we always end up expanding the stack even in the
  strongly typed methods.

The test results are:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/string-typ-no-logger               166879518                6.988 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string-typ-no-logger-12            167398297                6.973 ns/op           0 B/op          0 allocs/op
BenchmarkAny/string-any-no-logger               87669631                13.97 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string-any-no-logger-12            86760837                14.11 ns/op            0 B/op          0 allocs/op
BenchmarkAny/string-typ-logger                   3059485               395.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-typ-logger-12                3141176               379.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-any-logger                   2995699               401.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-any-logger-12                3071046               391.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/string-typ-logger-go                 784323              1351 ns/op             146 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-12             2000835               613.9 ns/op            96 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go                 477486              2479 ns/op             117 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-12             1830955               680.0 ns/op           112 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-stack           841566              1328 ns/op              96 B/op          2 allocs/op
BenchmarkAny/string-typ-logger-go-stack-12       2625226               479.6 ns/op            96 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-stack           486084              2493 ns/op             112 B/op          2 allocs/op
BenchmarkAny/string-any-logger-go-stack-12       2658640               667.9 ns/op           112 B/op          2 allocs/op
BenchmarkAny/stringer-typ-no-logger             147314238                8.034 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer-typ-no-logger-12          157857937                7.436 ns/op           0 B/op          0 allocs/op
BenchmarkAny/stringer-any-no-logger             58872349                20.19 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer-any-no-logger-12          60532305                20.27 ns/op            0 B/op          0 allocs/op
BenchmarkAny/stringer-typ-logger                 3094204               411.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-typ-logger-12              3163489               383.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-any-logger                 2981940               427.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-any-logger-12              2777792               394.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/stringer-typ-logger-go               911761              1335 ns/op              96 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-12           2006440               605.2 ns/op            96 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go               467934              2518 ns/op             112 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-12           1786076               683.1 ns/op           112 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-stack         855794              1316 ns/op              96 B/op          2 allocs/op
BenchmarkAny/stringer-typ-logger-go-stack-12     2598783               434.5 ns/op            96 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-stack         473282              2474 ns/op             112 B/op          2 allocs/op
BenchmarkAny/stringer-any-logger-go-stack-12     2020183               651.9 ns/op           112 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.516s
```

reviews

rename

review

farewell, my friend

dummy
sywhang pushed a commit that referenced this pull request Aug 2, 2023
Yet another attempt at reducing the stack size of zap.Any,
borrowing from #1301, #1303, #1304, #1305, #1307, and #1308.

This approach defines a generic data type for field constructors
of a specific type. This is similar to the lookup map in #1307,
minus the map lookup, the interface match, or reflection.

    type anyFieldC[T any] func(string, T) Field

The generic data type provides a non-generic method
matching the interface:

    interface{ Any(string, any) Field }

**Stack size**:
The stack size of zap.Any following this change is 0xc0 (192 bytes).

    % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any
    go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0

This is just 8 bytes more than #1305,
which is the smallest stack size of all other attempts.

**Allocations**:
Everything appears to get inlined with no heap escapes:

    % go build -gcflags -m 2>&1 |
      grep field.go |
      perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' |
      grep 'escapes'
    [no output]

(Line 413 declares anyFieldC)

Besides that, the output of `-m` for the relevant section of code
consists of almost entirely:

    ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any
    [...]
    ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any
    ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any

Followed by:

    ./field.go:428:10: leaking param: key
    ./field.go:428:22: leaking param: value

**Maintainability**:
Unlike some of the other approaches, this variant is more maintainable.
The `zap.Any` function looks roughly the same.
Adding new branches there is obvious, and requires no duplication.

**Performance**:
This is a net improvement against master on BenchmarkAny's
log-go checks that log inside a new goroutine.

```
name                           old time/op    new time/op    delta
Any/string/field-only/typed      25.2ns ± 1%    25.6ns ± 2%     ~     (p=0.460 n=5+5)
Any/string/field-only/any        56.9ns ± 3%    79.4ns ± 0%  +39.55%  (p=0.008 n=5+5)
Any/string/log/typed             1.47µs ± 0%    1.49µs ± 4%   +1.58%  (p=0.016 n=4+5)
Any/string/log/any               1.53µs ± 2%    1.55µs ± 1%   +1.37%  (p=0.016 n=5+5)
Any/string/log-go/typed          5.97µs ± 6%    5.99µs ± 1%     ~     (p=0.151 n=5+5)
Any/string/log-go/any            10.9µs ± 0%     6.2µs ± 0%  -43.32%  (p=0.008 n=5+5)
Any/stringer/field-only/typed    25.3ns ± 1%    25.5ns ± 1%   +1.09%  (p=0.008 n=5+5)
Any/stringer/field-only/any      85.5ns ± 1%   124.5ns ± 0%  +45.66%  (p=0.008 n=5+5)
Any/stringer/log/typed           1.43µs ± 1%    1.42µs ± 2%     ~     (p=0.175 n=4+5)
Any/stringer/log/any             1.50µs ± 1%    1.56µs ± 6%   +4.20%  (p=0.008 n=5+5)
Any/stringer/log-go/typed        5.94µs ± 0%    5.92µs ± 0%   -0.40%  (p=0.032 n=5+5)
Any/stringer/log-go/any          11.1µs ± 2%     6.3µs ± 0%  -42.93%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
Any/string/field-only/typed       0.00B          0.00B          ~     (all equal)
Any/string/field-only/any         0.00B          0.00B          ~     (all equal)
Any/string/log/typed              64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/string/log/any                64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/string/log-go/typed            112B ± 0%      112B ± 0%     ~     (all equal)
Any/string/log-go/any              128B ± 0%      128B ± 0%     ~     (all equal)
Any/stringer/field-only/typed     0.00B          0.00B          ~     (all equal)
Any/stringer/field-only/any       0.00B          0.00B          ~     (all equal)
Any/stringer/log/typed            64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/stringer/log/any              64.0B ± 0%     64.0B ± 0%     ~     (all equal)
Any/stringer/log-go/typed          112B ± 0%      112B ± 0%     ~     (all equal)
Any/stringer/log-go/any            128B ± 0%      128B ± 0%     ~     (all equal)

name                           old allocs/op  new allocs/op  delta
Any/string/field-only/typed        0.00           0.00          ~     (all equal)
Any/string/field-only/any          0.00           0.00          ~     (all equal)
Any/string/log/typed               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/string/log/any                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/string/log-go/typed            3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/string/log-go/any              3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/stringer/field-only/typed      0.00           0.00          ~     (all equal)
Any/stringer/field-only/any        0.00           0.00          ~     (all equal)
Any/stringer/log/typed             1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/stringer/log/any               1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Any/stringer/log-go/typed          3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Any/stringer/log-go/any            3.00 ± 0%      3.00 ± 0%     ~     (all equal)
```

It causes a regression in "field-only"
which calls the field constructor and discards the result
without using it in a logger.
I believe this is acceptable because that's not a real use case;
we expect the result to be used with a logger.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants