VDiff: Use byte compare if weight_string() returns null for either source or target #7696

rohit-nayak-ps · 2021-03-16T12:41:23Z

Description

VDiff compares text columns using the weight_string() of the value. However there are times when weight_string() can return a null value. Per the mysql reference

String-valued functions return NULL if the length of the result would be greater than the value of the max_allowed_packet system variable.

It is possible that just one of the weight_string(), for source or target is null or both are null, depending on the configuration of the source and target servers. Currently if one is null vdiff detects a mismatch and if both are null then it is considered a match. This is incorrect since, in such cases, vdiff needs to compare the actual data.

This PR detects such a condition and uses a byte compare instead.

See related issue for more details and repro.
Signed-off-by: Rohit Nayak rohit@planetscale.com

Related Issue(s)

VDiff false positive MismatchedRows when WEIGHT_STRING result is greater than max_allowed_packet #7296

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Impacted Areas in Vitess

Components that this PR will affect:

deepthi

LGTM

shlomi-noach · 2021-03-21T07:01:20Z

go/vt/wrangler/vdiff.go

+		// This detects if we are using weight_string() to compare this (text) column.
+		// If either source or target weight_string is null we fallback to a byte compare
+		if sourceRow[col].IsNull() && sourceRow[i].IsText() && col > i {
+			col = i
+		}
+		if targetRow[col].IsNull() && targetRow[i].IsText() && col > i {
+			col = i
+		}


I confess I'm completely lost in this logic. What is the logical connection between col (column) and i (array index)? What does it mean where col > i and what does it mean to assign col = i?

See https://github.com/vitessio/vitess/blob/98e4e0e9bfae1e1483e2f7fed952df455fe528af/go/vt/wrangler/vdiff.go#L437

VDiff adds additional pseudo-columns to the rows which contain the weight_string() for text fields and points to that column for comparison purposes. When those are null (for reasons mentioned in #7296) the PR falls back to the original column)

Thank you! I still don't understand the connection between col and i. They seem to be in different spaces. i is an array index. col is, ... col. Why would you test col > i or assign col = i`?

col is initialized to be the index i and then overwritten for text fields by pointing to the weight_string(): https://github.com/vitessio/vitess/blob/98e4e0e9bfae1e1483e2f7fed952df455fe528af/go/vt/wrangler/vdiff.go#L436

I see that compare is being called with td.comparePKs in one case and td.cmpareCols in the other. I feel like this logic may not work correctly in the case of comparePKs.

Also, I think it will become more readable if we changed cols to be a struct that contains a compareCol and an originalCol. If compareCol fails, then we can fall back to originalCol.

I see that compare is being called with td.comparePKs in one case and td.cmpareCols in the other. I feel like this logic may not work correctly in the case of comparePKs.

Also, I think it will become more readable if we changed cols to be a struct that contains a compareCol and an originalCol. If compareCol fails, then we can fall back to originalCol.

Refactored based on your suggestion. @sougou, please review both this change and the updated tests.

Thank you! I still don't understand the connection between col and i. They seem to be in different spaces. i is an array index. col is, ... col. Why would you test col > i or assign col = i`?

@shlomi-noach, when you get a chance please review the changes and see if it easier to understand now

shlomi-noach

asking for clarifications, please see inline

shlomi-noach

Thanks, this is now more readable

… target Signed-off-by: Rohit Nayak <rohit@planetscale.com>

…f tests needed changes. Signed-off-by: Rohit Nayak <rohit@planetscale.com>

rohit-nayak-ps requested review from shlomi-noach, sougou, deepthi and a team March 16, 2021 14:12

rohit-nayak-ps marked this pull request as ready for review March 16, 2021 14:16

deepthi approved these changes Mar 19, 2021

View reviewed changes

shlomi-noach reviewed Mar 21, 2021

View reviewed changes

rohit-nayak-ps marked this pull request as draft April 19, 2021 12:26

rohit-nayak-ps force-pushed the rn-vdiff-null-weight-string branch from 98e4e0e to a506a80 Compare April 19, 2021 20:25

rohit-nayak-ps marked this pull request as ready for review April 20, 2021 09:35

shlomi-noach approved these changes Apr 20, 2021

View reviewed changes

rohit-nayak-ps added 2 commits April 27, 2021 20:44

Use byte compare if weight_string() returns null for either source or…

a317b2d

… target Signed-off-by: Rohit Nayak <rohit@planetscale.com>

Refactor code to make data and weight_string columns explicit. Lots o…

921c225

…f tests needed changes. Signed-off-by: Rohit Nayak <rohit@planetscale.com>

rohit-nayak-ps force-pushed the rn-vdiff-null-weight-string branch from a506a80 to 921c225 Compare April 27, 2021 18:45

rohit-nayak-ps added Component: VReplication Type: Bug labels May 3, 2021

rohit-nayak-ps merged commit 3823544 into vitessio:master May 3, 2021

rohit-nayak-ps deleted the rn-vdiff-null-weight-string branch May 3, 2021 15:14

rohit-nayak-ps mentioned this pull request Aug 4, 2021

VDiff false positive MismatchedRows when WEIGHT_STRING result is greater than max_allowed_packet #7296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VDiff: Use byte compare if weight_string() returns null for either source or target #7696

VDiff: Use byte compare if weight_string() returns null for either source or target #7696

rohit-nayak-ps commented Mar 16, 2021 •

edited

deepthi left a comment

shlomi-noach Mar 21, 2021

rohit-nayak-ps Mar 21, 2021

shlomi-noach Mar 21, 2021

rohit-nayak-ps Mar 22, 2021

sougou Mar 24, 2021

rohit-nayak-ps Apr 20, 2021

rohit-nayak-ps Apr 20, 2021

shlomi-noach left a comment

shlomi-noach left a comment

VDiff: Use byte compare if weight_string() returns null for either source or target #7696

VDiff: Use byte compare if weight_string() returns null for either source or target #7696

Conversation

rohit-nayak-ps commented Mar 16, 2021 • edited

Description

Related Issue(s)

Checklist

Impacted Areas in Vitess

deepthi left a comment

Choose a reason for hiding this comment

shlomi-noach Mar 21, 2021

Choose a reason for hiding this comment

rohit-nayak-ps Mar 21, 2021

Choose a reason for hiding this comment

shlomi-noach Mar 21, 2021

Choose a reason for hiding this comment

rohit-nayak-ps Mar 22, 2021

Choose a reason for hiding this comment

sougou Mar 24, 2021

Choose a reason for hiding this comment

rohit-nayak-ps Apr 20, 2021

Choose a reason for hiding this comment

rohit-nayak-ps Apr 20, 2021

Choose a reason for hiding this comment

shlomi-noach left a comment

Choose a reason for hiding this comment

shlomi-noach left a comment

Choose a reason for hiding this comment

rohit-nayak-ps commented Mar 16, 2021 •

edited