Switch all klog to slog#841
Conversation
Migrates all logging call sites from k8s.io/klog/v2 to log/slog. Each binary now configures slog.SetDefault with a text handler keyed off a new --slog_level flag (the GCP server keeps its existing Cloud Logging exporter wiring). klog.V(N) call sites collapse to slog.Debug, and klog.Exitf/Fatalf are replaced with slog.Error + os.Exit(1). Adds golangci-lint config enforcing sloglint (attr-only, context: all) and a depguard rule denying further klog imports.
The klog --v=N and --logtostderr flags no longer exist after the slog migration. Replace --v=1 with --slog_level=-4 (slog.LevelDebug) where verbose logging was being requested, drop --logtostderr (slog defaults to stderr for the text handler), and update READMEs. Fixes the POSIX conformance CI run that aborted on "flag provided but not defined: -v".
| } | ||
|
|
||
| klog.V(2).Infof("%s: request %v %q => %s", a.log.origin, r.Method, r.URL, a.name) | ||
| slog.DebugContext(logCtx, "request received", slog.String("origin", a.log.origin), slog.String("method", r.Method), slog.String("url", r.URL.String()), slog.String("name", a.name)) |
There was a problem hiding this comment.
This will be extremely noisy for a public-facing log if debug is enabled, but I guess that's the name of the game. I think it's orthogonal to this PR, but a bunch of this logging should be reviewed to see if it's still needed. Perhaps traces and events are a better way of mapping out the "what happened on this request?" story.
There was a problem hiding this comment.
Agreed this is orthogonal to the migration — filed #843 to track the audit (drop / fold into OTel spans / keep with justification).
There was a problem hiding this comment.
Before this PR, we could control the verbosity of klog.V(X) by setting -v=X, and most binaries were configured with -v=1.
All those entries have been migrated to slog.DebugContext, and -v=X flags to slog_level=-4. That means that if this PR is rolled out, the number of logging entries will be much (much) higher. Specifically, all the requestlog.go logs had a -v level of 9 and had always been disabled, now these entries are indistinguishable from the ones that had a -v level of 1.
We've observed performance regression because of logging in the past (#756), so we should be mindful of this. I don't know what the impact is now that we've migrated to slog, but that's one of the thing that we need to keep an eye on as part of landing the slog work.
There was a problem hiding this comment.
I rolled this out to the conformance pipeline, and this resulted in a ~5-10X of logging entries: not rocket science but quickly look at logs it went from ~30k entries to ~210k.
roger2hk
left a comment
There was a problem hiding this comment.
Note that there is a difference between klog.Exitf and the combination of slog.ErrorContext and os.Exit(1). The new behavior will not flush the logs before exiting. For those async logging, there is a chance to lose the final log.
There was a problem hiding this comment.
This file has its own structured logging format and should be migrated to slog as well. Quickly glancing at this PR, I think it does most of it for this file, maybe except for "RL"? Just flagging this such that whoever picks this up knows to pay attention to this one. I'm not even sure that we want to keep this request log as it is, I haven't looked at it in a while.
There was a problem hiding this comment.
Filed #842 to track this — same conclusion as you reached: the RL: prefix is preserved verbatim in the migration, but it's worth a separate look at whether to keep, refactor, or drop the per-request log channel entirely.
klog.Exitf flushed buffered log entries before terminating; the slog.ErrorContext + os.Exit(1) replacement in the GCP server skips that, so entries buffered in the async Cloud Logging exporter could be lost. Adds a logFlushers slice populated by initLogging and a fatal() helper that logs, drains the flushers, then exits 1. Replaces all slog.ErrorContext + os.Exit(1) callsites in the GCP binary (main, otel) with fatal(). The text-handler-based binaries (aws, posix, fsck, hammer, migrate, fetch_roots) write synchronously to stderr and don't need this.
|
@roger2hk good catch on the flush behaviour — addressed in 6c7be3e. The GCP binary now has a The text-handler-based binaries (aws, posix, fsck, hammer, migrate, fetch_roots) write synchronously to stderr, so they don't need the same treatment. |
| } | ||
|
|
||
| klog.V(2).Infof("%s: request %v %q => %s", a.log.origin, r.Method, r.URL, a.name) | ||
| slog.DebugContext(logCtx, "request received", slog.String("origin", a.log.origin), slog.String("method", r.Method), slog.String("url", r.URL.String()), slog.String("name", a.name)) |
There was a problem hiding this comment.
Before this PR, we could control the verbosity of klog.V(X) by setting -v=X, and most binaries were configured with -v=1.
All those entries have been migrated to slog.DebugContext, and -v=X flags to slog_level=-4. That means that if this PR is rolled out, the number of logging entries will be much (much) higher. Specifically, all the requestlog.go logs had a -v level of 9 and had always been disabled, now these entries are indistinguishable from the ones that had a -v level of 1.
We've observed performance regression because of logging in the past (#756), so we should be mindful of this. I don't know what the impact is now that we've migrated to slog, but that's one of the thing that we need to keep an eye on as part of landing the slog work.
| formatlist("--roots_remote_fetch_url=%s", var.roots_remote_fetch_url), | ||
| "--roots_remote_fetch_interval=${var.roots_remote_fetch_interval}", | ||
| "-v=2", | ||
| "--slog_level=-4", |
There was a problem hiding this comment.
This needs to be rolled out, the conformance test is broken in the mean time, I'm on it.
This PR implements two new structured logging levels: - LevelDebugExtra (-8) - LevelExtreme (-12) It adds contextual convenience wrappers in the logger package. The new custom levels are filtered client-side to restrict ingestion volumes and automatically normalize down to standard DEBUG severities when exported to Google Cloud Logging. This should allow very similar filtering as we had before #841.
Migrates all logging call sites from k8s.io/klog/v2 to log/slog. Each binary now configures slog.SetDefault with a text handler keyed off a new --slog_level flag (the GCP server keeps its existing Cloud Logging exporter wiring). klog.V(N) call sites collapse to slog.Debug, and klog.Exitf/Fatalf are replaced with slog.Error + os.Exit(1).
Adds golangci-lint config enforcing sloglint (attr-only, context: all) and a depguard rule denying further klog imports.
Towards #416.