Skip to content

Switch all klog to slog#841

Merged
patflynn merged 3 commits into
mainfrom
klog-to-slog
Apr 29, 2026
Merged

Switch all klog to slog#841
patflynn merged 3 commits into
mainfrom
klog-to-slog

Conversation

@patflynn
Copy link
Copy Markdown
Contributor

@patflynn patflynn commented Apr 28, 2026

Migrates all logging call sites from k8s.io/klog/v2 to log/slog. Each binary now configures slog.SetDefault with a text handler keyed off a new --slog_level flag (the GCP server keeps its existing Cloud Logging exporter wiring). klog.V(N) call sites collapse to slog.Debug, and klog.Exitf/Fatalf are replaced with slog.Error + os.Exit(1).

Adds golangci-lint config enforcing sloglint (attr-only, context: all) and a depguard rule denying further klog imports.

Towards #416.

Migrates all logging call sites from k8s.io/klog/v2 to log/slog. Each
binary now configures slog.SetDefault with a text handler keyed off a
new --slog_level flag (the GCP server keeps its existing Cloud Logging
exporter wiring). klog.V(N) call sites collapse to slog.Debug, and
klog.Exitf/Fatalf are replaced with slog.Error + os.Exit(1).

Adds golangci-lint config enforcing sloglint (attr-only, context: all)
and a depguard rule denying further klog imports.
@patflynn patflynn requested a review from a team as a code owner April 28, 2026 14:54
@patflynn patflynn requested review from phbnf and removed request for a team April 28, 2026 14:54
The klog --v=N and --logtostderr flags no longer exist after the slog
migration. Replace --v=1 with --slog_level=-4 (slog.LevelDebug) where
verbose logging was being requested, drop --logtostderr (slog defaults
to stderr for the text handler), and update READMEs.

Fixes the POSIX conformance CI run that aborted on
"flag provided but not defined: -v".
@phbnf phbnf requested a review from mhutchinson April 29, 2026 08:54
Comment thread internal/ct/handlers.go
}

klog.V(2).Infof("%s: request %v %q => %s", a.log.origin, r.Method, r.URL, a.name)
slog.DebugContext(logCtx, "request received", slog.String("origin", a.log.origin), slog.String("method", r.Method), slog.String("url", r.URL.String()), slog.String("name", a.name))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be extremely noisy for a public-facing log if debug is enabled, but I guess that's the name of the game. I think it's orthogonal to this PR, but a bunch of this logging should be reviewed to see if it's still needed. Perhaps traces and events are a better way of mapping out the "what happened on this request?" story.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this is orthogonal to the migration — filed #843 to track the audit (drop / fold into OTel spans / keep with justification).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, we could control the verbosity of klog.V(X) by setting -v=X, and most binaries were configured with -v=1.

All those entries have been migrated to slog.DebugContext, and -v=X flags to slog_level=-4. That means that if this PR is rolled out, the number of logging entries will be much (much) higher. Specifically, all the requestlog.go logs had a -v level of 9 and had always been disabled, now these entries are indistinguishable from the ones that had a -v level of 1.

We've observed performance regression because of logging in the past (#756), so we should be mindful of this. I don't know what the impact is now that we've migrated to slog, but that's one of the thing that we need to keep an eye on as part of landing the slog work.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rolled this out to the conformance pipeline, and this resulted in a ~5-10X of logging entries: not rocket science but quickly look at logs it went from ~30k entries to ~210k.

Copy link
Copy Markdown
Contributor

@roger2hk roger2hk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a difference between klog.Exitf and the combination of slog.ErrorContext and os.Exit(1). The new behavior will not flush the logs before exiting. For those async logging, there is a chance to lose the final log.

Comment thread internal/ct/requestlog.go
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has its own structured logging format and should be migrated to slog as well. Quickly glancing at this PR, I think it does most of it for this file, maybe except for "RL"? Just flagging this such that whoever picks this up knows to pay attention to this one. I'm not even sure that we want to keep this request log as it is, I haven't looked at it in a while.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #842 to track this — same conclusion as you reached: the RL: prefix is preserved verbatim in the migration, but it's worth a separate look at whether to keep, refactor, or drop the per-request log channel entirely.

klog.Exitf flushed buffered log entries before terminating; the
slog.ErrorContext + os.Exit(1) replacement in the GCP server skips
that, so entries buffered in the async Cloud Logging exporter could be
lost.

Adds a logFlushers slice populated by initLogging and a fatal() helper
that logs, drains the flushers, then exits 1. Replaces all
slog.ErrorContext + os.Exit(1) callsites in the GCP binary
(main, otel) with fatal().

The text-handler-based binaries (aws, posix, fsck, hammer, migrate,
fetch_roots) write synchronously to stderr and don't need this.
@patflynn
Copy link
Copy Markdown
Contributor Author

@roger2hk good catch on the flush behaviour — addressed in 6c7be3e. The GCP binary now has a logFlushers slice that initLogging populates with the Cloud Logging client's Close(), and a fatal() helper that drains them before os.Exit(1). All slog.ErrorContext + os.Exit(1) callsites in cmd/tesseract/gcp/{main,otel}.go now go through fatal().

The text-handler-based binaries (aws, posix, fsck, hammer, migrate, fetch_roots) write synchronously to stderr, so they don't need the same treatment.

@patflynn patflynn merged commit e25bb65 into main Apr 29, 2026
14 checks passed
@patflynn patflynn deleted the klog-to-slog branch April 29, 2026 16:17
Comment thread internal/ct/handlers.go
}

klog.V(2).Infof("%s: request %v %q => %s", a.log.origin, r.Method, r.URL, a.name)
slog.DebugContext(logCtx, "request received", slog.String("origin", a.log.origin), slog.String("method", r.Method), slog.String("url", r.URL.String()), slog.String("name", a.name))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, we could control the verbosity of klog.V(X) by setting -v=X, and most binaries were configured with -v=1.

All those entries have been migrated to slog.DebugContext, and -v=X flags to slog_level=-4. That means that if this PR is rolled out, the number of logging entries will be much (much) higher. Specifically, all the requestlog.go logs had a -v level of 9 and had always been disabled, now these entries are indistinguishable from the ones that had a -v level of 1.

We've observed performance regression because of logging in the past (#756), so we should be mindful of this. I don't know what the impact is now that we've migrated to slog, but that's one of the thing that we need to keep an eye on as part of landing the slog work.

formatlist("--roots_remote_fetch_url=%s", var.roots_remote_fetch_url),
"--roots_remote_fetch_interval=${var.roots_remote_fetch_interval}",
"-v=2",
"--slog_level=-4",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be rolled out, the conformance test is broken in the mean time, I'm on it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

mhutchinson added a commit that referenced this pull request Apr 30, 2026
This PR implements two new structured logging levels:
 - LevelDebugExtra (-8)
 - LevelExtreme (-12)

It adds contextual convenience wrappers in the logger package. The new custom levels are filtered client-side to restrict ingestion volumes and automatically normalize down to standard DEBUG severities when exported to Google Cloud Logging. This should allow very similar filtering as we had before #841.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants