Description
What version of Badger are you using?
v4.5.1
Does this issue reproduce with the latest master?
We have tested this with the latest version.
What operating system and version are you using?
OS: Windows 10 Pro
What Go version are you using?
Go version : 1.22.12
What did you do? (Steps to reproduce)
Have a BadgerDB v4 database with a corrupted SSTable. The corruption specifically seems to affect the table index reading process, triggering an initial panic during t.initIndex() or related FlatBuffer parsing.
Attempt to open this database using badger.Open(opts). The opts include a custom logger.
badger.Open() might return successfully without a synchronous error or panic.
Shortly after the database appears to open, a background goroutine initiated by BadgerDB (seemingly from newLevelsController for table loading/initialization) encounters an issue.
What did you expect to see?
We expected that if BadgerDB encounters severe data corruption in a background task that leads to an internal panic (even a deliberate re-panic for debug purposes), it should:
Catch this panic within its own background goroutine.
Log a detailed error through the provided logger.
Transition the DB instance to an error state or signal the application in a manageable way (e.g., by making subsequent operations return errors, or by closing the DB and making Open fail on next attempt if not already).
The application should not crash due to an unhandled panic in a BadgerDB background goroutine.
The application crashes due to runtime.fatalpanic. The stack trace indicates that an initial panic (P1) during table index processing (e.g., in flatbuffers.GetUint32) was caught by a defer in table.(*Table).initBiggestAndSmallest. This defer then collected debug information and subsequently executed a nested defer which intentionally re-panicked (P2) with this debug information. This P2 was not caught further up in BadgerDB's background goroutine call stack, leading to fatalpanic.
Key Stack Trace Snippet of the fatal panic:
runtime.fatalpanic (panic.go:1217) runtime
runtime.gopanic (panic.go:779) runtime
table.(*Table).initBiggestAndSmallest.func1.1 (table.go:352) github.com/dgraph-io/badger/v4/table // <<< P2 (re-panic with debug info) occurs here
runtime.deferreturn (panic.go:602) runtime // Part of defer handling for P1
table.(*Table).initBiggestAndSmallest.func1 (table.go:398) github.com/dgraph-io/badger/v4/table // Outer defer in initBiggestAndSmallest that caught P1
runtime.gopanic (panic.go:770) runtime // This is gopanic for P1
runtime.goPanicIndex (panic.go:114) runtime // P1 (e.g., index out of bounds)
flatbuffers.GetUint32 (encode.go:47) github.com/google/flatbuffers/go // P1 might originate here
flatbuffers.GetUOffsetT (encode.go:121) github.com/google/flatbuffers/go
fb.GetRootAsTableIndex (TableIndex.go:14) github.com/dgraph-io/badger/v4/fb
table.(*Table).readTableIndex (table.go:706) github.com/dgraph-io/badger/v4/table
table.(*Table).initIndex (table.go:463) github.com/dgraph-io/badger/v4/table
table.(*Table).initBiggestAndSmallest (table.go:402) github.com/dgraph-io/badger/v4/table // Call leading to P1
table.OpenTable (table.go:308) github.com/dgraph-io/badger/v4/table
badger.newLevelsController.func1 (levels.go:151) github.com/dgraph-io/badger/v4 // This is likely running in a new goroutine
badger.newLevelsController.gowrap2 (levels.go:166) github.com/dgraph-io/badger/v4
runtime.goexit (asm_amd64.s:1695) runtime // Goroutine exit
- asynchronous stack trace
badger.newLevelsController (levels.go:130) github.com/dgraph-io/badger/v4
(Optional) Relevant code analysis / Source of the re-panic:
// In (t *Table).initBiggestAndSmallest()
// ...
defer func() { // D1 - Outer defer
if r := recover(); r != nil { // Catches P1 from initIndex or other table ops
var debugBuf bytes.Buffer
defer func() { // D2 - Nested defer
// THIS LINE INTENTIONALLY RE-PANICS (P2) WITH DEBUG INFO
panic(fmt.Sprintf("%s\n== Recovered ==\n", debugBuf.String()))
}()
// Code to populate debugBuf with file ID, size, checksums, index info, etc.
fmt.Fprintf(&debugBuf, "\n== Recovering from initIndex crash ==\n") // Part of the P2 message
// ... more Fprintf calls ...
}
}()
// ...
This deliberate re-panic P2, if unhandled in the background goroutine where initBiggestAndSmallest is executed, causes the fatalpanic.