-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sampler counts faster, and also binned by level #252
Conversation
@jcorbin, thanks for your PR! By analyzing the history of the files in this pull request, we identified @akshayjshah, @prashantv and @creiht to be potential reviewers. |
zapcore/sampler.go
Outdated
func (c *counters) Reset(key string) { | ||
c.Lock() | ||
count := c.counts[key] | ||
c.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault - I think that this should be an RLock
/RUnlock
, right? (We're getting rid of this counters implementation anyways, but it's possible that this'll affect some of the benchmarks.)
zapcore/sampler.go
Outdated
type bucket2 struct { | ||
sync.Mutex | ||
counts [_counters2PerLock]uint64 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're not keeping track of the unhashed messages, what's the benefit of using counts + mutexes instead of a flat array of atomics?
zapcore/sampler.go
Outdated
|
||
// xorstring converts a string into a uint64 by xoring together its | ||
// codepoints. It works by accumulating into a 64-bit "ring" which gets | ||
// rotated by the apparent "byte width" of each codepoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment's no longer accurate (in case you forget when un-WIP-ing this PR.)
zapcore/sampler.go
Outdated
|
||
// xshrr computes a "randomly" rotated xorshift; this is the "XSH RR" | ||
// transformation borrowed from the PCG famiily of random generators. It | ||
// returns a 32-bit output from a 64-bit state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably a reasonable stand-in for a user of this library, and I hadn't heard of "PCG random generators" until talking to you. If we're going to name-check this paper, let's include a link :)
"conjecture", | ||
"Arctomys", | ||
"guapilla", | ||
"Sassenach", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps overly sensitive, but let's not include vaguely racist epithets (even if they're randomly selected from the system dictionary). In my understanding, "Sassenach" is in the same ballpark as "honky" and "bouzhoi."
"raphidiferous", | ||
"pooh", | ||
"Tyndallize", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use this same test data to generate a much larger set of n-grams, and use that to sanity check the hash function? Just want to make sure that we don't mess up and end up with way more hash collisions than expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 for taking this idea much further than I would have. Dunno about Prashant, but I'm satisfied that XSHRR is a reasonable option.
ed47415
to
d94151c
Compare
WIP: updated, broke out, benchmarked, and compared distributions on hash function alternatives; the only clear win to me is taking over the implementation:
I'm going to continue on then and actually implement the level bucking work, and reduce locking counters2 itself. Future directions:
|
54b7c28
to
a338d78
Compare
Updated:
|
zapcore/sampler.go
Outdated
return &counters{} | ||
} | ||
|
||
type counters [_numLevels][_countersPerLevel]uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're using these atomically, can we use atomic.Uint64
here? we already have dependencies on the atomic package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well:
- I suspect that it will be more memory inefficient, if not actually slower, due to
atomic.Uint64
being a struct with a pointer field, rather than the actualuint64
values living in this array - the only use that we're getting from
go.uber.org/atomic
is for theAtomicLevel
type, which is already a struct, and arguably would be simpler if it used standardsync/atomic
functions directly
I'll benchmark to test any supposed difference, but more towards point 2, I really don't think that using go.uber.org/atomic
is adding much value for zap at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a struct with a pointer field, the field is a value.
https://github.com/uber-go/atomic/blob/master/atomic.go#L175
The memory layout will be exactly the same, there should be no indirection either way.
The only performance difference will be because of the extra function call, which Go should be inlining but doesn't yet (functions that call other functions are not inlined). Unfortunately that will probably add a handful of nanoseconds.
We should probably have benchmarks in the atomic package itself to measure the overhead.
The atomic package only really provides a couple of things:
- safety: since you can't access the underlying value directly, you have to use the atomic accessors. This avoids accidentally doing a non-atomic read of an atomic value, which causes a data race.
- readability: rather than
atomic.AddUint64(&c[i][j], 1)
, the code becomesc[i][j].Inc()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize this would be a point for debate. I'd also strongly prefer that we use the atomic package.
zapcore/sampler.go
Outdated
func (c *counters) Inc(lvl Level, key string) uint64 { | ||
i := lvl - minLevel | ||
j := c.hash(key) % _countersPerLevel | ||
return atomic.AddUint64(&c[i][j], 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a method to return the specific counter?
func (c *counters) counterFor(lvl Level, key string) *atomic.Uint64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense (modulo return type per my above reservations on atomic packages)
zapcore/sampler.go
Outdated
|
||
func (c *counters) Inc(lvl Level, key string) uint64 { | ||
i := lvl - minLevel | ||
j := c.hash(key) % _countersPerLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're picking the _countersPerLevel
, and we've picked a nice power of 2, so we can use bit masking here rather than a mod.
const _countersPerLevelMask = _countersPerLevel - 1
[...]
j = c.hash(key) & _countersPerLevelMask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I was hoping that the go compiler knew this trick, but I've not checked or verified; will do ;-)
zapcore/sampler.go
Outdated
} | ||
|
||
func (c *counters) hash(key string) uint32 { | ||
return hash.XSHRR(key) % _countersPerLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're modding here, and we're modding in Reset
and Inc
. I think hash
should not do any hashing and just return a uint32
and the counterFor
method should be the only place we do any hashing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you mean that hash
shouldn't do any modding, and counterFor
should be the only place we mod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this is a legacy hold over from prior code form, will fix. Once we have counterFor
, it can just call hash.Foo
directly, rather than have any one-line func (*counters) hash
imo
Facility | ||
|
||
tick time.Duration | ||
counts *counters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any advantage to using a pointer here rather than just a value, and then being able to remove newCounters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the counts to be shared between parent and child loggers (so that sampling imposes a predictable limit on global logging load), so a pointer makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course, makes sense.
internal/hash/fnv.go
Outdated
// FNV32 computes the 32-bit FNV-1 hash of s. | ||
func FNV32(s string) uint32 { | ||
const ( | ||
offset32 = 2166136261 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the constants are the same, should we make them top-level constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but we'll probably remove the generality of the hash module, and just stick with either XSHRR
or FNV32a
to taste (and probably inlined in zapcore/sampler.go
.
This internal/hash
was a demonstrative step for this PR, unless you'd rather we keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I see, makes sense. This analysis is great, so I'd love to keep it somehow (either in this PR as is, and we use a separate PR to merge just the hash function we want), or in the internal/hash folder.
internal/hash/bench_test.go
Outdated
"go.uber.org/zap/internal/hash" | ||
) | ||
|
||
// XXX limit to go 1.7+, or stop using sub-benchmarks... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, I'm comfortable with a separate pre-PR to drop Go 1.6 support. 1.8 is just around the corner, so we'll then support latest release and one older.
internal/hash/bench_test.go
Outdated
rand100s4 = randCross(rand100s3, rand100, 100000) | ||
) | ||
|
||
var testCorpii = []struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latin nit: "testCorpora"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I didn't check or verify my pluralism, I'll borg yours ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or render it into English, "corpses". Or "bodies" in the Anglish.
zapcore/level.go
Outdated
@@ -49,6 +49,9 @@ const ( | |||
PanicLevel | |||
// FatalLevel logs a message, then calls os.Exit(1). | |||
FatalLevel | |||
|
|||
minLevel = DebugLevel | |||
maxLevel = FatalLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not always good about remembering, but we should underscore-prefix unexported globals.
zapcore/sampler.go
Outdated
} | ||
|
||
func (c *counters) hash(key string) uint32 { | ||
return hash.XSHRR(key) % _countersPerLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you mean that hash
shouldn't do any modding, and counterFor
should be the only place we mod.
internal/hash/xshrr.go
Outdated
|
||
package hash | ||
|
||
// XSHRR hashes a string using the by threading each byte of it thru the PCG32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing word here.
Facility | ||
|
||
tick time.Duration | ||
counts *counters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the counts to be shared between parent and child loggers (so that sampling imposes a predictable limit on global logging load), so a pointer makes sense to me.
"raphidiferous", | ||
"pooh", | ||
"Tyndallize", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 for taking this idea much further than I would have. Dunno about Prashant, but I'm satisfied that XSHRR is a reasonable option.
} | ||
return s.Facility.Check(ent, ce) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also move the unit tests for the sampling facility to sampler_test.go
?
Love the IPython notebook - I had no idea GitHub rendered them so well. Some small nits aside, I'm satisfied with using either an internal FNV hash or XSHRR. |
Okay, so unless @prashantv disagrees, I'll just roll with xshrr then; I don't feel strongly, other than in theory:
|
Thanks for the awesome IPython notebook @jcorbin! I don't have any strong opinions on hashes, I've seen fnv32 used more, which was why I originally suggested that. Looking at the output, it seems to me like:
The data seems to point towards fnv32a -- do you believe that xshrr will be better in more real world log scenarios Josh? |
Yup, the ordering of fnv32a and xshrr varies per experiment run (recall the notebook randomly creates a new shuffling of /usr/share/dict/words every time you run it); so on distribution I'd call them "competitive". Micro perf certainly is a ding on xshrr for sure. While the larger constant and extra hardening step help my "sleep better at night", they've not really proven to be material for the (arguably unrealistic) test corpora. I'm more than happy to just use fnv32a for now, writing down the pcg32-xsh-rr trick for later consideration if we decide that a min-sketch is better (since I'd be more confident parameterizing pcg32-xsh-rr to yield a family of hash functions, than fnv). |
a338d78
to
949c73d
Compare
30032d2
to
f59ce58
Compare
Updated:
|
|
||
func (c *counters) get(lvl Level, key string) *atomic.Uint64 { | ||
i := lvl - _minLevel | ||
j := fnv32a(key) % _countersPerLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prashantv confirmed that the 1.7 compiler is sufficiently clever here.
The modulo compiles to & 4096-1
:
. . 8a701: ANDL $0xfff, AX
As seen in:
(pprof) disasm counters..get
Total: 34.89s
ROUTINE ======================== go.uber.org/zap/zapcore.(*counters).get
1.33s 3.81s (flat, cum) 10.92% of Total
120ms 120ms 8a6a0: GS MOVQ GS:0x8a0, CX
10ms 10ms 8a6a9: CMPQ 0x10(CX), SP
. . 8a6ad: JBE 0x8a728
80ms 80ms 8a6af: SUBQ $0x28, SP
40ms 40ms 8a6b3: MOVQ BP, 0x20(SP)
10ms 10ms 8a6b8: LEAQ 0x20(SP), BP
. . 8a6bd: MOVQ 0x40(SP), AX
60ms 60ms 8a6c2: MOVQ AX, 0(SP)
50ms 50ms 8a6c6: MOVQ 0x48(SP), AX
60ms 60ms 8a6cb: MOVQ AX, 0x8(SP)
70ms 70ms 8a6d0: MOVZX 0x38(SP), AX
30ms 30ms 8a6d5: INCL AX
40ms 40ms 8a6d7: MOVB AL, 0x1f(SP)
60ms 2.54s 8a6db: CALL go.uber.org/zap/zapcore.fnv32a(SB)
50ms 50ms 8a6e0: MOVL 0x10(SP), AX
240ms 240ms 8a6e4: MOVQ 0x30(SP), CX
20ms 20ms 8a6e9: TESTB AL, 0(CX)
250ms 250ms 8a6eb: MOVZX 0x1f(SP), DX
. . 8a6f0: MOVSX DL, DX
. . 8a6f4: CMPQ $0x6, DX
. . 8a6f8: JAE 0x8a721
. . 8a6fa: SHLQ $0xf, DX
40ms 40ms 8a6fe: ADDQ CX, DX
. . 8a701: ANDL $0xfff, AX
. . 8a706: CMPQ $0x1000, AX
. . 8a70c: JAE 0x8a721
. . 8a70e: LEAQ 0(DX)(AX*8), AX
100ms 100ms 8a712: MOVQ AX, 0x50(SP)
. . 8a717: MOVQ 0x20(SP), BP
. . 8a71c: ADDQ $0x28, SP
. . 8a720: RET
. . 8a721: CALL runtime.panicindex(SB)
. . 8a726: UD2
. . 8a728: CALL runtime.morestack_noctxt(SB)
. . 8a72d: JMP go.uber.org/zap/zapcore.(*counters).get(SB)
. . 8a732: INT $0x3
. . 8a733: INT $0x3
. . 8a734: INT $0x3
. . 8a735: INT $0x3
. . 8a736: INT $0x3
. . 8a737: INT $0x3
. . 8a738: INT $0x3
. . 8a739: INT $0x3
. . 8a73a: INT $0x3
. . 8a73b: INT $0x3
. . 8a73c: INT $0x3
. . 8a73d: INT $0x3
. . 8a73e: INT $0x3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming Josh!
f59ce58
to
fa59069
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks great, just import order and that commented-out test.
"lithotint", | ||
"bargander", | ||
"opisthographical", | ||
"cacography", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding "cacography" to my day-to-day vocabulary.
zapcore/sampler_test.go
Outdated
"github.com/stretchr/testify/assert" | ||
"go.uber.org/zap/internal/observer" | ||
"go.uber.org/zap/testutils" | ||
. "go.uber.org/zap/zapcore" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: import ordering. assert
should be in a separate group.
zapcore/sampler_test.go
Outdated
|
||
// assert.Equal(t, []spy.Log(nil), sink.Logs(), "Unexpected output when sampling with Check.") | ||
// } | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we restore this test now? We should be able to use the internal exit
package to make sure that we're actually sampling messages at panic and fatal too.
Turns out that the comment out test is just wholly obsolete:
|
|
||
func (c *counters) get(lvl Level, key string) *atomic.Uint64 { | ||
i := lvl - _minLevel | ||
j := fnv32a(key) % _countersPerLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for confirming Josh!
Facility | ||
|
||
tick time.Duration | ||
counts *counters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course, makes sense.
|
||
func TestSampler(t *testing.T) { | ||
for _, lvl := range []Level{DebugLevel, InfoLevel, WarnLevel, ErrorLevel, DPanicLevel} { | ||
sampler, logs := fakeSampler(DebugLevel, time.Minute, 2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have any tests that ensure that separate levels use separate counters. I was able checkout the code, make the following patch:
--- a/zapcore/sampler.go
+++ b/zapcore/sampler.go
@@ -48,6 +48,7 @@ func (c *counters) Reset(lvl Level, key string) {
func (c *counters) get(lvl Level, key string) *atomic.Uint64 {
i := lvl - _minLevel
j := fnv32a(key) % _countersPerLevel
+ j = 0
return &c[i][j]
}
And all the tests still passed. Can we add a test for this case?
} | ||
|
||
if n := s.counts.Inc(ent.Level, ent.Message); n > s.first { | ||
if n == s.first+1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this s.first+1
to s.first
doesn't affect any tests, which means:
- we should add more test coverage
- we should better define what the behaviour should be for "first". I assume "first log every second" means if we log once a second, all logs would be output. However that's not the behaviour I'm seeing:
func TestSamplerReset(t *testing.T) {
sampler, logs := fakeSampler(DebugLevel, time.Second, 1, 10000)
writeIter(sampler, 1, DebugLevel)
time.Sleep(2 * time.Second)
writeIter(sampler, 2, DebugLevel)
expected := buildExpectation(DebugLevel, 1, 2)
assert.Equal(t, expected, logs.All())
}
For this PR, I think we should just add a bit more test coverage on the statement (so we know exactly what behaviour we currently have), and then open an issue with the behaviour we want. The ideal behaviour may be expensive with the current implementation since we'd have to start the timer as soon as we see the first message, which is definitely not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good eyes - this is wrong.
In practice, I don't think that the correct behavior will be so bad, since logging only the first message each second is pretty extreme. I'm planning to make the default 100 first and 1000 thereafter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on which correct behaviour we're talking about I think. We have a few options:
- current option of
s.first+1
which I don't think has any benefits s.first
which will start the timer as soon as we're about to start dropping and will handle my above example better.n == 1
Let's consider another example: we're logging every 0.5 seconds, and the parameters are s.first = 2
, s.tick = time.Second
s.thereafter = 100
(or some high number).
Option 3 would log every message (which seems like the ideal behaviour), option 2 would drop every 3rd message, option 1 would drop every 3rd and 4th message.
Option 3 will also start a timer every s.tick
for every log message, which seems really bad. I think Option 2 is a good compromise between performance and sampling.
I don't think that the test is obsolete, since the other tests aren't exercising |
The more I think about his, the less I like these options. Imagine sampling
the first 100 messages and every 1000 thereafter every second. If I'm only
logging 5 messages per sec, options 1 & 2 are occasionally going to drop
all messages for a second. That's pretty bad.
I think we can get around this by leveraging the time stamp on the Entry.
I'll try to play with some ideas this weekend.
|
Option 3 is definitely my preferred solution, but making it efficient will require rethinking the sampling logic (using timestamps). However, I don't think that change should be made as part of this PR. The refactoring to use bins + hash is unrelated to the sampling tick logic, so we should fix that separately. |
e8c1be0
to
f729bd1
Compare
Updated |
ffd4ad9
to
7202086
Compare
See discussion on how the current sampler is wrong: #252 (comment) Instead of using `time.AfterFunc` and resetting timers, use timestamps on the entries, and maintain a sliding window for the current "tick" with atomics to track when the counters should be reset.
See discussion on how the current sampler is wrong: #252 (comment) Instead of using `time.AfterFunc` and resetting timers, use timestamps on the entries, and maintain a sliding window for the current "tick" with atomics to track when the counters should be reset. The alternate approach is to start the timer as soon as the first message for any bucket is logged, but that would cause more timers to be started, which costs an allocation as well as requiring a global lock on the timer heap.
See discussion on how the current sampler is wrong: #252 (comment) Instead of using `time.AfterFunc` and resetting timers, use timestamps on the entries, and maintain a sliding window for the current "tick" with atomics to track when the counters should be reset. The alternate approach is to start the timer as soon as the first message for any bucket is logged, but that would cause more timers to be started, which costs an allocation as well as requiring a global lock on the timer heap.
See discussion on how the current sampler is wrong: #252 (comment) Instead of using `time.AfterFunc` and resetting timers, use timestamps on the entries, and maintain a sliding window for the current "tick" with atomics to track when the counters should be reset. The alternate approach is to start the timer as soon as the first message for any bucket is logged, but that would cause more timers to be started, which costs an allocation as well as requiring a global lock on the timer heap.
* break out zapcore/sampler module * Benchmark sampling * Re-implement counts table to be lockless and static memory * Use two-level count binning to separate levels
Towards #176: