Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic in VTOrc #10519

Merged
merged 3 commits into from
Jun 22, 2022
Merged

Fix panic in VTOrc #10519

merged 3 commits into from
Jun 22, 2022

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Jun 16, 2022

Description

A panic was seen in VTOrc with the following log -

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0xec5205]

goroutine 1444 [running]:
vitess.io/vitess/go/vt/orchestrator/logic.electNewPrimary({_, _}, {{{0xc00134cd70, 0xc}, 0xcea}, {{0xc00134ce80, 0xc}, 0xcea}, 0x2, {0x0, ...}, ...}, ...)
	vitess.io/vitess/go/vt/orchestrator/logic/topology_recovery.go:1778 +0x485
vitess.io/vitess/go/vt/orchestrator/logic.executeCheckAndRecoverFunction({{{0xc00134cd70, 0xc}, 0xcea}, {{0xc00134ce80, 0xc}, 0xcea}, 0x2, {0x0, 0x0, 0x0}, ...}, ...)
	vitess.io/vitess/go/vt/orchestrator/logic/topology_recovery.go:1395 +0xbbe
vitess.io/vitess/go/vt/orchestrator/logic.CheckAndRecover.func1()
	vitess.io/vitess/go/vt/orchestrator/logic/topology_recovery.go:1483 +0x5f
created by vitess.io/vitess/go/vt/orchestrator/logic.CheckAndRecover
	vitess.io/vitess/go/vt/orchestrator/logic/topology_recovery.go:1482 +0x470

On investigation it was found to occur because the *events.Reparent field returned was nil and the code does not check if the field is nil.

This PR fixes this panic by checking if ev is nil before using its fields.

This PR also increases the timeout of the default values of LockShardTimeoutSeconds and WaitReplicasTimeoutSeconds to 30 seconds each.

Related Issue(s)

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@github-actions
Copy link
Contributor

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive.
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good.
I'm just wondering if it is possible to write a test for this at all.
If that is too hard, can you confirm that this was verified via manual testing?

Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
…t config

Signed-off-by: Manan Gupta <manan@planetscale.com>
@GuptaManan100
Copy link
Member Author

@deepthi Yes, when I found the problem, I jumped and fixed it without adding a reproducing test. I have fixed that now. There is a test which reproduces the panic (and fails on main), which passes after the changes!

@GuptaManan100 GuptaManan100 merged commit 82bfcbc into vitessio:main Jun 22, 2022
@GuptaManan100 GuptaManan100 deleted the vtorc-panic-fix branch June 22, 2022 01:03
GuptaManan100 added a commit to planetscale/vitess that referenced this pull request Jun 22, 2022
* test: reproduce the panic as a unit test

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: check ev is not nil before using its fields

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: increase timeout of LockShard and wait replicas in VTOrc default config

Signed-off-by: Manan Gupta <manan@planetscale.com>
GuptaManan100 added a commit that referenced this pull request Jun 22, 2022
* test: reproduce the panic as a unit test

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: check ev is not nil before using its fields

Signed-off-by: Manan Gupta <manan@planetscale.com>

* feat: increase timeout of LockShard and wait replicas in VTOrc default config

Signed-off-by: Manan Gupta <manan@planetscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: VTorc Vitess Orchestrator integration Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants