Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binlog JSON Parser: handle inline types correctly for large documents #8187

Merged
merged 6 commits into from
Jun 3, 2021

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented May 25, 2021

Description

In JSON Objects and Arrays some data types are inlined. The existing code only expected that 2 byte values (int16s, literals, null) could be inlined. However for "large" documents (> 64K) mysql also inlines 4 byte values (int32s). This resulted in out-of-bounds panics because the inline 32-bit integers were treated as offsets into the document.

While the main reason of this PR is to fix the inline bug, we take this opportunity to:

  • some refactoring to remove unnecessary logic, simplify code, improve comments
  • adds defensive guards before accessing data to check if calculated offsets are out of bounds. This would mean that our parser is not complete but now it logs the full data as well as information about all columns parsed to-date. This will tell us both the binary value that is failing as well as (ideally) the PK of that row so that we can get the actual json value as well
  • add new tests including one with a json doc that caused the failure mode reported below

Signed-off-by: Rohit Nayak rohit@planetscale.com

Error Snippet

A MoveTables failed due to the json parsing logic panicking for a binlog event with this error:

vttablet: rpc error: code = Unknown desc = uncaught panic: runtime error: slice bounds out of range [33331247:65855]
. Stack-trace:
/app/.vendor/go1.15.10/go/src/runtime/panic.go:116 (0x43cce4)
/app/.gopath/src/vitess.io/vitess/go/mysql/binlog_event_json.go:586 (0xfc9b92)
/app/.gopath/src/vitess.io/vitess/go/mysql/binlog_event_json.go:111 (0xfc6352)
/app/.gopath/src/vitess.io/vitess/go/mysql/binlog_event_json.go:587 (0xfc963c)
/app/.gopath/src/vitess.io/vitess/go/mysql/binlog_event_json.go:111 (0xfc6352)
/app/.gopath/src/vitess.io/vitess/go/mysql/binlog_event_json.go:99 (0xfc6116)
/app/.gopath/src/vitess.io/vitess/go/mysql/binlog_event_json.go:67 (0xfc5f04)
/app/.gopath/src/vitess.io/vitess/go/mysql/binlog_event_rbr.go:838 (0xfcc784)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:876 (0x12a6b1d)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:807 (0x12a6184)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:565 (0x12a31c4)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:311 (0x12a0ba4)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:185 (0x12a0564)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/vstreamer.go:164 (0x12a0296)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/uvstreamer.go:377 (0x129ecc4)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/vstreamer/engine.go:225 (0x1292510)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/tabletserver/tabletserver.go:1125 (0x1335566)
/app/.gopath/src/vitess.io/vitess/go/vt/vttablet/grpcqueryservice/server.go:349 (0x1425684)
/app/.gopath/src/vitess.io/vitess/go/vt/proto/queryservice/queryservice.pb.go:1064 (0x141b0cd)
/app/.gopath/pkg/mod/google.golang.org/grpc@v1.24.0/server.go:1199 (0xa6213d)
/app/.gopath/pkg/mod/google.golang.org/grpc@v1.24.0/server.go:1279 (0xa63a2c)
/app/.gopath/pkg/mod/google.golang.org/grpc@v1.24.0/server.go:710 (0xa71aa4)
/app/.vendor/go1.15.10/go/src/runtime/asm_amd64.s:1374 (0x478020)

…e detail in binlog parsing to identify errant data

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps changed the title Binlog JSON Parser: Add defensive guards to avoid panic in json parsing logic and log more detail in error Do Not Merge: Binlog JSON Parser: Add defensive guards to avoid panic in json parsing logic and log more detail in error May 25, 2021
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review May 25, 2021 13:14
…ensive comments. Add test for reported failure mode and improve other tests

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps changed the title Do Not Merge: Binlog JSON Parser: Add defensive guards to avoid panic in json parsing logic and log more detail in error Do Not Merge: Binlog JSON Parser: handle inline types correctly for large documents May 27, 2021
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps changed the title Do Not Merge: Binlog JSON Parser: handle inline types correctly for large documents Binlog JSON Parser: handle inline types correctly for large documents Jun 2, 2021
@rohit-nayak-ps
Copy link
Contributor Author

This is now ready for review: applying a patch with the inline fix in this PR fixed the MoveTables workflow that was failing.

} else {
fmt.Printf("%02d ", c)
s += fmt.Sprintf("%02d ", c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concatenation will end up inefficient. Consider using strings.Builder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is only used for local debugging of the functionality. It is off in production. It will be removed soon once we have a few more prod tests with the json data type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, cool. In that case, keep it as it is.

}

// only used for logging/debugging
var jsonTypeToName = map[uint]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no boolean type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the literal type is used for booleans and nulls

Comment on lines +259 to +292
return length, pos
}

// getElem returns the json value found inside json objects and arrays at the provided position
func getElem(data []byte, pos int, large bool) (*ajson.Node, int, error) {
var elem *ajson.Node
var err error
var offset int
typ := jsonDataType(data[pos])
pos++
if isInline(typ, large) {
elem, err = binlogJSON.getNode(typ, data, pos)
if err != nil {
return nil, 0, err
}
if large {
pos += 4
} else {
pos += 2
}
} else {
offset, pos = readInt(data, pos, large)
if offset >= len(data) { // consistency check, should only come here is there is a bug in the code
log.Errorf("unable to decode element")
return nil, 0, fmt.Errorf("unable to decode element: %+v", data)
}
newData := data[offset:]
//newPos ignored because this is an offset into the "extra" section of the buffer
elem, err = binlogJSON.getNode(typ, newData, 1)
if err != nil {
return nil, 0, err
}
}
return elem, pos, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gonna trust you on this 😅

@rohit-nayak-ps rohit-nayak-ps merged commit 52004e0 into vitessio:main Jun 3, 2021
@rohit-nayak-ps rohit-nayak-ps deleted the rn-json-data-panic branch June 3, 2021 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants