Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VDiff: Use byte compare if weight_string() returns null for either source or target #7696

Merged

Conversation

rohit-nayak-ps
Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps commented Mar 16, 2021

Description

VDiff compares text columns using the weight_string() of the value. However there are times when weight_string() can return a null value. Per the mysql reference

String-valued functions return NULL if the length of the result would be greater than the value of the max_allowed_packet system variable.

It is possible that just one of the weight_string(), for source or target is null or both are null, depending on the configuration of the source and target servers. Currently if one is null vdiff detects a mismatch and if both are null then it is considered a match. This is incorrect since, in such cases, vdiff needs to compare the actual data.

This PR detects such a condition and uses a byte compare instead.

See related issue for more details and repro.
Signed-off-by: Rohit Nayak rohit@planetscale.com

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review March 16, 2021 14:16
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 970 to 991
// This detects if we are using weight_string() to compare this (text) column.
// If either source or target weight_string is null we fallback to a byte compare
if sourceRow[col].IsNull() && sourceRow[i].IsText() && col > i {
col = i
}
if targetRow[col].IsNull() && targetRow[i].IsText() && col > i {
col = i
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confess I'm completely lost in this logic. What is the logical connection between col (column) and i (array index)? What does it mean where col > i and what does it mean to assign col = i?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/vitessio/vitess/blob/98e4e0e9bfae1e1483e2f7fed952df455fe528af/go/vt/wrangler/vdiff.go#L437

VDiff adds additional pseudo-columns to the rows which contain the weight_string() for text fields and points to that column for comparison purposes. When those are null (for reasons mentioned in #7296) the PR falls back to the original column)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I still don't understand the connection between col and i. They seem to be in different spaces. i is an array index. col is, ... col. Why would you test col > i or assign col = i`?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

col is initialized to be the index i and then overwritten for text fields by pointing to the weight_string(): https://github.com/vitessio/vitess/blob/98e4e0e9bfae1e1483e2f7fed952df455fe528af/go/vt/wrangler/vdiff.go#L436

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that compare is being called with td.comparePKs in one case and td.cmpareCols in the other. I feel like this logic may not work correctly in the case of comparePKs.

Also, I think it will become more readable if we changed cols to be a struct that contains a compareCol and an originalCol. If compareCol fails, then we can fall back to originalCol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that compare is being called with td.comparePKs in one case and td.cmpareCols in the other. I feel like this logic may not work correctly in the case of comparePKs.

Also, I think it will become more readable if we changed cols to be a struct that contains a compareCol and an originalCol. If compareCol fails, then we can fall back to originalCol.

Refactored based on your suggestion. @sougou, please review both this change and the updated tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I still don't understand the connection between col and i. They seem to be in different spaces. i is an array index. col is, ... col. Why would you test col > i or assign col = i`?

@shlomi-noach, when you get a chance please review the changes and see if it easier to understand now

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asking for clarifications, please see inline

@rohit-nayak-ps rohit-nayak-ps marked this pull request as draft April 19, 2021 12:26
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review April 20, 2021 09:35
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is now more readable

… target

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
…f tests needed changes.

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants