Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/sampler: Support decision hook #813

Merged
merged 17 commits into from Apr 21, 2020
Merged

Conversation

shirchen
Copy link
Contributor

@shirchen shirchen commented Apr 4, 2020

This adds support for monitoring sampling decisions made by the Sampler
core with a func(Entry, SamplingDecision) where SamplingDecision is
a bit field.

To allow plumbing the hook to the sampler, this additionally deprecates
the NewSampler constructor in favor of NewSamplerWithOptions.

type SamplerOption

func SamplerHook(func(Entry, SamplingDecision))

func NewSamplerWithOptions(/* ... */, opts ...SamplerOption)

This functionality is usable from the zap package via the new Hook
field of zap.SamplingConfig.

Refs T5056227

@codecov
Copy link

@codecov codecov bot commented Apr 4, 2020

Codecov Report

Merging #813 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #813      +/-   ##
==========================================
+ Coverage   98.31%   98.33%   +0.02%     
==========================================
  Files          43       43              
  Lines        2315     2347      +32     
==========================================
+ Hits         2276     2308      +32     
  Misses         32       32              
  Partials        7        7              
Impacted Files Coverage Δ
config.go 100.00% <100.00%> (ø)
zapcore/sampler.go 100.00% <100.00%> (ø)
options.go 100.00% <0.00%> (ø)
zapcore/increase_level.go 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ec2dc45...ae3e230. Read the comment docs.

@shirchen shirchen requested review from abhinav and prashantv Apr 6, 2020
config.go Outdated
@@ -207,10 +212,20 @@ func (cfg Config) buildOptions(errSink zapcore.WriteSyncer) []Option {
if !cfg.DisableStacktrace {
opts = append(opts, AddStacktrace(stackLevel))
}
if cfg.Sampling != nil && cfg.Sampling.Hook == nil {
// Assign a default nop sampling hook.
cfg.Sampling.Hook = zapcore.NopSamplingHook
Copy link
Contributor

@abhinav abhinav Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should apply the default in NewSamplerWithOptions and just not pass the SamplerHook argument if nil here.

if scfg := cfg.Sampling; scfg != nil {
  opts = append(opts, WrapCore(func(core zapcore.Core) zapcore.Core {
    var samplerOpts []zapcore.SamplerOption
    if scfg.Hook != nil {
      samplerOpts = append(samplerOpts, zapcore.SamplerHook(..))
    }
    return ...
  }))


const (
// Dropped means that a log was dropped.
Dropped SamplingDecision = iota
Copy link
Contributor

@abhinav abhinav Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea behind this but the presence of this type suggests that there are at least two values: dropped and sampled, and I would then expect my hook to be called for all cases: sampled and dropped.

So question for you and @prashantv is: Should we call the hook for both, sampled and dropped cases? If not, then the hook is just a "drop hook" or similar. (Name TBD)

Naming-wise: Sampled and Dropped works for me. To be more specific, we can also use LogSampled and LogDropped.

return nil
}

// SamplerHook registers a which will be called when Sampler makes a decision.
Copy link
Contributor

@abhinav abhinav Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: missing "a function"

// logs. Note, there is no access to Fields in this hook. In the future, this
// hook can be expanded to emit whether this is first entry that was dropped,
// first after a period, etc.
func SamplerHook(hook func(entry Entry, dec SamplingDecision) error) SamplerOption {
Copy link
Contributor

@abhinav abhinav Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if hooks should be able to return errors.
I don't think we'll want the sampler to react upon errors in the hook.

zapcore/sampler.go Show resolved Hide resolved
@shirchen shirchen requested a review from abhinav Apr 7, 2020
Copy link
Contributor

@abhinav abhinav left a comment

Looks fine; no major comments.
Will give documentation another pass tomorrow.


RE: performance concerns raised offline: if that matters, we can drop
hook-for-all-events in favor of hook-per-event, with only one event supported
right now:

type SamplerHook struct {
    Dropped func(Entry)  // no hook if nil
}

func nopHook(Entry) {} // unexported

func WithSamplerHook(SamplerHook)

If sampled hook ever becomes necessary, it'll be another field on SamplerHook.

 type SamplerHook struct {
    Dropped func(Entry)
+   Sampled func(Entry)
 }

But again, that's only if we think the perf impact is worth it (which I
suspect it isn't).

config_test.go Outdated
droppedCount := &atomic.Int64{}
sampledCount := &atomic.Int64{}
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use new for pointers to zero value.

Suggested change
droppedCount := &atomic.Int64{}
sampledCount := &atomic.Int64{}
droppedCount := new(atomic.Int64)
sampledCount := new(atomic.Int64)

The other convention is var and & before returning:

Suggested change
droppedCount := &atomic.Int64{}
sampledCount := &atomic.Int64{}
var dropped, sampled atomic.Int64
  return h, &dropped, &sampled

@@ -81,17 +81,87 @@ func (c *counter) IncCheckReset(t time.Time, tick time.Duration) uint64 {
return 1
}

// SamplingDecision represents a decision made by sampler.
type SamplingDecision uint8
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is so that it takes one byte instead of 8?

Copy link
Contributor Author

@shirchen shirchen Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, as byte is an alias to uint8.

}

// NopSamplingHook is the default hook used by sampler.
func NopSamplingHook(_ Entry, _ SamplingDecision) {}
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can drop _ when they're all _

Suggested change
func NopSamplingHook(_ Entry, _ SamplingDecision) {}
func NopSamplingHook(Entry, SamplingDecision) {}

return &sampler{
Core: core,
tick: tick,
counts: newCounters(),
first: uint64(first),
thereafter: uint64(thereafter),
hook: NopSamplingHook,
}
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make it so that the old implementation calls the new one so that if we make changes to sample initialization, they don't have to go in two places.

Suggested change
return &sampler{
Core: core,
tick: tick,
counts: newCounters(),
first: uint64(first),
thereafter: uint64(thereafter),
hook: NopSamplingHook,
}
return NewSamplerWithOptions(core, tick, first, thereafter)

Copy link
Contributor Author

@shirchen shirchen Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

return h, droppedCount, sampledCount
}

func BenchmarkSampler_CheckWithHook(b *testing.B) {
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: count dropped messages only to more accurately measure the use case we're trying to satisfy here.

Copy link
Contributor

@abhinav abhinav left a comment

gave docs a pass

config.go Outdated
type SamplingConfig struct {
Initial int `json:"initial" yaml:"initial"`
Thereafter int `json:"thereafter" yaml:"thereafter"`
Hook func(zapcore.Entry, zapcore.SamplingDecision)
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add json:"-" yaml:"-" so the JSON/YAML moduels do not attempt to serialize this.

Suggested change
Hook func(zapcore.Entry, zapcore.SamplingDecision)
Hook func(zapcore.Entry, zapcore.SamplingDecision) `json:"-" yaml:"-"`

config.go Outdated
// Hook is called whenever a Sampler makes a decision. Currently, whenever a
// log is dropped.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part no longer true. How about,

Suggested change
// Hook is called whenever a Sampler makes a decision. Currently, whenever a
// log is dropped.
// If specified, the Sampler will invoke the Hook after each decision.

@@ -81,17 +81,87 @@ func (c *counter) IncCheckReset(t time.Time, tick time.Duration) uint64 {
return 1
}

// SamplingDecision represents a decision made by sampler.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all types represent something. can do:

Suggested change
// SamplingDecision represents a decision made by sampler.
// SamplingDecision is a decision made by sampler.

type SamplingDecision uint8

const (
// LogDropped means that a log was dropped.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// LogDropped means that a log was dropped.
// LogDropped indicates that the Sampler dropped a log entry.

const (
// LogDropped means that a log was dropped.
LogDropped SamplingDecision = iota
// LogSampled means that a log was successfully sampled.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// LogSampled means that a log was successfully sampled.
// LogSampled indicates that the Sampler sampled a log entry.

f(s)
}

// SamplerOption configures a Sampler option.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SamplerOption configures a Sampler option.
// SamplerOption configures a Sampler.

apply(*sampler)
}

// NopSamplingHook is the default hook used by sampler.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to export this? We'll use this if unset.

Copy link
Contributor Author

@shirchen shirchen Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if someone wants to test this as NewSampler* constructors are public?

Copy link
Contributor

@abhinav abhinav Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewSampler can be public but I mean the implementation of a no-op hook. Users don't really need access to that; they can write their own empty function, or just not provide the SamplingHook option.

Copy link
Contributor

@prashantv prashantv Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, not even sure if we need this in the production code, we could keep it in the test file?

Copy link
Contributor

@abhinav abhinav Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's production. The sampler defaults the hook field to NopSamplingHook.

Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, in that case, can we unexport and keep it here?

// decision. Currently a hook is called when a log is dropped and
// zapcore.LogDropped decision is emitted.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part no longer true.

Suggested change
// decision. Currently a hook is called when a log is dropped and
// zapcore.LogDropped decision is emitted.
// decision.

// This hook is useful for side effects, for example emitting number of dropped
// logs. Note, there is no access to Fields in this hook. In the future, this
// hook can be expanded to emit whether this is first entry that was dropped,
// first after a period, etc.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid mentioning future plans in documentation. That's a comment-level thing, possibly referencing tickets.

Suggested change
// This hook is useful for side effects, for example emitting number of dropped
// logs. Note, there is no access to Fields in this hook. In the future, this
// hook can be expanded to emit whether this is first entry that was dropped,
// first after a period, etc.
// This hook may be used to get visibility into the performance of the sampler.
// For example, use it to track metrics of dropped versus sampled logs.

// Sampler also accepts an optional hook that can be used to count number of
// dropped logs.
Copy link
Contributor

@abhinav abhinav Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Sampler also accepts an optional hook that can be used to count number of
// dropped logs.
// Sampler can be configured to report sampling decisions with the SamplerHook
// option.

// Keep in mind that zap's sampling implementation is optimized for speed over
// absolute precision; under load, each tick may be slightly over- or
// under-sampled.
func NewSampler(core Core, tick time.Duration, first, thereafter int) Core {
return &sampler{
func NewSamplerWithOptions(core Core, tick time.Duration, first, thereafter int, opts ...SamplerOption) Core {
Copy link
Contributor

@prashantv prashantv Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to reduce the added API surface (E.g., new function that looks similar + new types), we could add a Build method on the SamplingConfig struct, similar to the Build method on the Configuration struct:
https://pkg.go.dev/go.uber.org/zap?tab=doc#Config.Build

Copy link
Contributor

@abhinav abhinav Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we still want a means of building the sampler directly?
Config.Build is porcelain for zap.New. Anything you can do with Config.Build,
you can do with zap.New.
IMO SamplingConfig would need to match that: the underlying constructor would
need to provide the same functionality—unless we decide to deprecate direct
constructors and switch to Config structs.

Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I'm not sure how valuable the direct structs have been vs configs, but don't have to make that change here. let's keep this as-is

config_test.go Outdated Show resolved Hide resolved
config_test.go Outdated
OutputPaths: []string{"stderr"},
ErrorOutputPaths: []string{"stderr"},
}
expectN := 2 + 100 + 1 // 2 from initial logs, 100 initial sampled logs, 1 from off-by-one in sampler
Copy link
Contributor

@prashantv prashantv Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we fix the off-by-one, seems a little odd, haven't looked into where it's coming from

Copy link
Contributor Author

@shirchen shirchen Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I didn't verify the comment and copied from another test. Fixed explanation.

// LogDropped indicates that the Sampler dropped a log entry.
LogDropped SamplingDecision = iota
// LogSampled indicates that the Sampler sampled a log entry.
LogSampled
Copy link
Contributor

@prashantv prashantv Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something to discuss

  • it would be useful to know when a log was first dropped, and then first log after the sample again. e.g., if you want to add a special log that said "message X is now being dropped, no further logs"?
  • doing this in the hook requires doing the same thing the sampler does, probably more inefficiently (a map by message)

Copy link
Contributor

@abhinav abhinav Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea: If we make SamplingDecision a bitfield, we could do that separately and re-use sampler logic

LogDropped SamplingDecision = 1 << iota
LogDroppedInitial
LogSampled
LogSampledThereafter

Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I like the idea of a bit field, so users can just check decision & LogSampled > 0 to see if it was sampled.

we might want to make that clear in the comments that users should check this like a bit field (or maybe we provide a helper on the type to do that).

We could expand this to have:

  • log sampled because it's within the initial
  • log sampled because it's the thereafter
  • log dropped (first log after initial sampled logs)
  • log dropped (first log after thereafter sampled log)
  • log dropped (other)

apply(*sampler)
}

// NopSamplingHook is the default hook used by sampler.
Copy link
Contributor

@prashantv prashantv Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, not even sure if we need this in the production code, we could keep it in the test file?

expectRe := `{"level":"info","caller":"zap/config_test.go:\d+","msg":"info","k":"v","z":"zz"}` + "\n" +
`{"level":"warn","caller":"zap/config_test.go:\d+","msg":"warn","k":"v","z":"zz"}` + "\n"
expectDropped := 99 // 200 - 100 initial - 1 thereafter
expectSampled := 103 // 2 from initial + 100 + 1 thereafter
Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this the same as expectN? we expect the hook to be called on every log, and then expectDropped + expectSampled == expectN?

Copy link
Contributor Author

@shirchen shirchen Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expectN is not a count from samplerHook, but from a zap hook which only measures what is sent. i regret carrying over a hook from another test now as it seems to cause confusion. :)

@@ -81,33 +81,96 @@ func (c *counter) IncCheckReset(t time.Time, tick time.Duration) uint64 {
return 1
}

type sampler struct {
Core
// SamplingDecision is a decision made by sampler.
Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should comment that more decision values may be added in future, so it's not safe to assume these are the only values.

// LogDropped indicates that the Sampler dropped a log entry.
LogDropped SamplingDecision = iota
// LogSampled indicates that the Sampler sampled a log entry.
LogSampled
Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I like the idea of a bit field, so users can just check decision & LogSampled > 0 to see if it was sampled.

we might want to make that clear in the comments that users should check this like a bit field (or maybe we provide a helper on the type to do that).

We could expand this to have:

  • log sampled because it's within the initial
  • log sampled because it's the thereafter
  • log dropped (first log after initial sampled logs)
  • log dropped (first log after thereafter sampled log)
  • log dropped (other)

apply(*sampler)
}

// NopSamplingHook is the default hook used by sampler.
Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, in that case, can we unexport and keep it here?

// Keep in mind that zap's sampling implementation is optimized for speed over
// absolute precision; under load, each tick may be slightly over- or
// under-sampled.
func NewSampler(core Core, tick time.Duration, first, thereafter int) Core {
return &sampler{
func NewSamplerWithOptions(core Core, tick time.Duration, first, thereafter int, opts ...SamplerOption) Core {
Copy link
Contributor

@prashantv prashantv Apr 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I'm not sure how valuable the direct structs have been vs configs, but don't have to make that change here. let's keep this as-is

Copy link
Contributor

@abhinav abhinav left a comment

Mostly LGTM. TestConfigWithSamplingHook isn't quite clear/obvious to me.

// For example, use it to track metrics of dropped versus sampled logs.
func SamplerHook(hook func(entry Entry, dec SamplingDecision)) SamplerOption {
Copy link
Contributor

@abhinav abhinav Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a sample of how to use the sampling decision?

Suggested change
// For example, use it to track metrics of dropped versus sampled logs.
func SamplerHook(hook func(entry Entry, dec SamplingDecision)) SamplerOption {
// For example, use it to track metrics of dropped versus sampled logs.
//
// var dropped atomic.Int64
// zapcore.SamplerHook(func(ent zapcore.Entry, dec zapcore.SamplingDecision) {
// if dec&zapcore.LogDropped > 0 {
// dropped.Inc()
// }
// })
func SamplerHook(hook func(entry Entry, dec SamplingDecision)) SamplerOption {

config_test.go Outdated
OutputPaths: []string{"stderr"},
ErrorOutputPaths: []string{"stderr"},
}
expectN := 2 + 101 // out of 203 logs: 3 types (debug, info, warn), 2 (debug + warn) + 101 (100 initial + 200th thereafter)
Copy link
Contributor

@abhinav abhinav Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by these numbers. What exactly is expectN?

Copy link
Contributor Author

@shirchen shirchen Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -37,6 +37,7 @@ import (

func fakeSampler(lvl LevelEnabler, tick time.Duration, first, thereafter int) (Core, *observer.ObservedLogs) {
core, logs := observer.New(lvl)
// Keep using deprecated constructor for cc.
Copy link
Contributor

@prashantv prashantv Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is cc and why can't this be updated?

Copy link
Contributor Author

@shirchen shirchen Apr 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code coverage

@abhinav abhinav changed the title Add an optional SamplerHook to Sampler core/sampler: Support decision hook Apr 21, 2020
@abhinav abhinav merged commit 46939fc into uber-go:master Apr 21, 2020
4 checks passed
uvfive pushed a commit to bxsmart/zap that referenced this issue May 21, 2020
This adds support for monitoring sampling decisions made by the Sampler
core with a `func(Entry, SamplingDecision)` where `SamplingDecision` is
a bit field.

To allow plumbing the hook to the sampler, this additionally deprecates
the `NewSampler` constructor in favor of `NewSamplerWithOptions`.

    type SamplerOption

    func SamplerHook(func(Entry, SamplingDecision))

    func NewSamplerWithOptions(/* ... */, opts ...SamplerOption)

This functionality is usable from the `zap` package via the new `Hook`
field of `zap.SamplingConfig`.

Refs T5056227
abhinav added a commit that referenced this issue Sep 10, 2021
The TestSamplerConcurrent test frequently fails with the following error
in CI:

    --- FAIL: TestSamplerConcurrent (0.25s)
        sampler_test.go:198:
        	    Error Trace:	sampler_test.go:198
        	    Error:      	Max difference between 1250 and 1004 allowed is 125, but difference was 246
        	    Test:       	TestSamplerConcurrent
        	    Messages:   	Unexpected number of logs
    FAIL

The test is intended to verify that
despite an onsalught of messages from multiple goroutines,
we only allow at most `logsPerTick` messages per `tick`.

This was accompilshed by spin-looping 10 goroutines for `numTicks`,
each logging one of `numMessages` different messages,
and then verifying the final log count.

The source of flakiness here was the non-determinism in
how far a goroutine would get in logging enough messages such that
the sampler would be engaged.

In #948, we added a `zapcore.Clock` interface with a ticker and
a mock implementation.
Move that to `ztest` for use here.

To unflake the test, use the mock clock to control time and
for each goroutine, log `logsPerTick*2` messages `numTicks` times.
This gives us,

    for numGoroutines (10)
        for numTicks (25)
            log logsPerTick * 2 (50) messages

We end up attempting to log a total of,

    (numGoroutines * numTicks * logsPerTick * 2) messages
    = (10 * 25 * 50) messages
    = 12500 messages

Of these, the following should be sampled:

    numMessages * numTicks * logsPerTick
    = 5 * 10 * 25
    = 1250

Everything else should be dropped.

For extra confidence, use a SamplerHook (added in #813) to verify that
the number of sampled and dropped messages meet expectations.
abhinav added a commit that referenced this issue Sep 10, 2021
The TestSamplerConcurrent test frequently fails with the following error
in CI:

    --- FAIL: TestSamplerConcurrent (0.25s)
        sampler_test.go:198:
        	    Error Trace:	sampler_test.go:198
        	    Error:      	Max difference between 1250 and 1004 allowed is 125, but difference was 246
        	    Test:       	TestSamplerConcurrent
        	    Messages:   	Unexpected number of logs
    FAIL

The test is intended to verify that
despite an onsalught of messages from multiple goroutines,
we only allow at most `logsPerTick` messages per `tick`.

This was accompilshed by spin-looping 10 goroutines for `numTicks`,
each logging one of `numMessages` different messages,
and then verifying the final log count.

The source of flakiness here was the non-determinism in
how far a goroutine would get in logging enough messages such that
the sampler would be engaged.

In #948, we added a `zapcore.Clock` interface with a ticker and
a mock implementation.
Move that to `ztest` for use here.

To unflake the test, use the mock clock to control time and
for each goroutine, log `logsPerTick*2` messages `numTicks` times.
This gives us,

    for numGoroutines (10)
        for numTicks (25)
            log logsPerTick * 2 (50) messages

We end up attempting to log a total of,

    (numGoroutines * numTicks * logsPerTick * 2) messages
    = (10 * 25 * 50) messages
    = 12500 messages

Of these, the following should be sampled:

    numMessages * numTicks * logsPerTick
    = 5 * 10 * 25
    = 1250

Everything else should be dropped.

For extra confidence, use a SamplerHook (added in #813) to verify that
the number of sampled and dropped messages meet expectations.
abhinav added a commit that referenced this issue Sep 10, 2021
The TestSamplerConcurrent test frequently fails with the following error
in CI:

    --- FAIL: TestSamplerConcurrent (0.25s)
        sampler_test.go:198:
        	    Error Trace:	sampler_test.go:198
        	    Error:      	Max difference between 1250 and 1004 allowed is 125, but difference was 246
        	    Test:       	TestSamplerConcurrent
        	    Messages:   	Unexpected number of logs
    FAIL

The test is intended to verify that
despite an onsalught of messages from multiple goroutines,
we only allow at most `logsPerTick` messages per `tick`.

This was accompilshed by spin-looping 10 goroutines for `numTicks`,
each logging one of `numMessages` different messages,
and then verifying the final log count.

The source of flakiness here was the non-determinism in
how far a goroutine would get in logging enough messages such that
the sampler would be engaged.

In #948, we added a `zapcore.Clock` interface with a ticker and
a mock implementation.
Move that to `ztest` for use here.

To unflake the test, use the mock clock to control time and
for each goroutine, log `logsPerTick*2` messages `numTicks` times.
This gives us,

    for numGoroutines (10)
        for numTicks (25)
            log logsPerTick * 2 (50) messages

We end up attempting to log a total of,

    (numGoroutines * numTicks * logsPerTick * 2) messages
    = (10 * 25 * 50) messages
    = 12500 messages

Of these, the following should be sampled:

    numMessages * numTicks * logsPerTick
    = 5 * 10 * 25
    = 1250

Everything else should be dropped.

For extra confidence, use a SamplerHook (added in #813) to verify that
the number of sampled and dropped messages meet expectations.

Refs GO-873
cgxxv pushed a commit to cgxxv/zap that referenced this issue Mar 25, 2022
This adds support for monitoring sampling decisions made by the Sampler
core with a `func(Entry, SamplingDecision)` where `SamplingDecision` is
a bit field.

To allow plumbing the hook to the sampler, this additionally deprecates
the `NewSampler` constructor in favor of `NewSamplerWithOptions`.

    type SamplerOption

    func SamplerHook(func(Entry, SamplingDecision))

    func NewSamplerWithOptions(/* ... */, opts ...SamplerOption)

This functionality is usable from the `zap` package via the new `Hook`
field of `zap.SamplingConfig`.

Refs T5056227
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants