setup-docker-builder: add boltdb integrity check metrics#54
Merged
adityamaru merged 1 commit intomainfrom Nov 17, 2025
Merged
Conversation
094f532 to
379f771
Compare
aayushshah15
requested changes
Nov 17, 2025
Contributor
aayushshah15
left a comment
There was a problem hiding this comment.
And then we should turn that debug into a warning.
Add metrics tracking for BoltDB integrity check results to monitor how many organizations are experiencing database corruption issues. This metric is sent to the FA agent's internal metrics endpoint and forwarded to Grafana via OpenTelemetry with the following attributes: - database_file: history.db or cache.db - result: passed, failed, timeout, or oom - repo: repository name - region: blacksmith region - installation_id: organization installation ID - duration_ms: check duration The metric helps us understand the prevalence and nature of BoltDB integrity issues across our customer base without relying on logs.
0fcb166 to
09980d2
Compare
aayushshah15
approved these changes
Nov 17, 2025
Contributor
aayushshah15
left a comment
There was a problem hiding this comment.
And then we should turn that debug into a warning.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add metrics tracking for BoltDB integrity check failures only to monitor database corruption issues.
Changes
reportIntegrityCheckFailurefunction inreporter.tsthat sends failure metrics to FA agent's internal metrics endpointcheckBoltDbIntegrityfunction inmain.tsMetric Details
The metric is sent to the FA agent's
/internalendpoint and forwarded to Grafana via OpenTelemetry with a single attribute:database_file: history.db or cache.dbMetric name:
boltdb_integrity_check_failureValue: Always 1 (for each failure)
Related PRs
Related Issue
Relates to BLA-2024: stickydisk,docker: dig into continued sticky disk failure
This provides clean visibility into which specific database files (history.db vs cache.db) are experiencing corruption issues.
Note
Adds BoltDB integrity-check failure reporting to the agent, runs checks before startup and during cleanup, and includes filesystem usage in sticky disk commits.
*.dbunder/var/lib/buildkit; runbbolt checkwith memory/time limits; log durations and sizes.reportIntegrityCheckFailure(sendsboltdb_integrity_check_failurewithdatabase_file).logDatabaseHashes.reportIntegrityCheckFailure(dbFile)inreporter.tsposts to agent/internalendpoint.dfand passfsDiskUsageBytestocommitStickyDisk(only when valid).Written by Cursor Bugbot for commit 09980d2. This will update automatically on new commits. Configure here.