Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix tablegc flaky test #10230

Merged
merged 13 commits into from
May 9, 2022
Merged

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented May 8, 2022

Description

Fixes #7984

Hopefully this finally fixes the flaky TestPurge test in CI. The failure almost never reproduces locally, but does rarely. anyway, the solution is based on a bit of extended timeout, and also saves runtime where possible.

What I'm going to do is re-run this particular CI test (tabletmanager_tablegc) a few times, manually, even if successful..

Related Issue(s)

#7984

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@github-actions
Copy link
Contributor

github-actions bot commented May 8, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has the correct release notes label. release notes none should only be used for PRs that are so trivial that they need not be included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

@shlomi-noach
Copy link
Contributor Author

Bummer! After several manual re-runs, I still see flakiness:

2022-05-08T10:39:02.3093006Z === RUN   TestPopulateTable
2022-05-08T10:39:02.3093359Z --- PASS: TestPopulateTable (1.03s)
2022-05-08T10:39:02.3093627Z === RUN   TestHold
2022-05-08T10:39:02.3093918Z     tablegc_test.go:236: 
2022-05-08T10:39:02.3094298Z         	Error Trace:	tablegc_test.go:236
2022-05-08T10:39:02.3094656Z         	Error:      	Should be true
2022-05-08T10:39:02.3094982Z         	Test:       	TestHold
2022-05-08T10:39:02.3095261Z     tablegc_test.go:238: 
2022-05-08T10:39:02.3095655Z         	Error Trace:	tablegc_test.go:238
2022-05-08T10:39:02.3096068Z         	Error:      	Received unexpected error:
2022-05-08T10:39:02.3098131Z         	            	You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1 (errno 1064) (sqlstate 42000) during query: drop table if exists
2022-05-08T10:39:02.3098643Z         	Test:       	TestHold
2022-05-08T10:39:02.3098950Z --- FAIL: TestHold (17.03s)
2022-05-08T10:39:02.3099194Z === RUN   TestEvac
2022-05-08T10:39:02.3099481Z --- PASS: TestEvac (23.03s)
2022-05-08T10:39:02.3099720Z === RUN   TestDrop
2022-05-08T10:39:02.3100017Z --- PASS: TestDrop (22.02s)
2022-05-08T10:39:02.3100246Z === RUN   TestPurge
2022-05-08T10:39:02.3100534Z     tablegc_test.go:330: 
2022-05-08T10:39:02.3100930Z         	Error Trace:	tablegc_test.go:330
2022-05-08T10:39:02.3101290Z         	Error:      	Should be true
2022-05-08T10:39:02.3101607Z         	Test:       	TestPurge
2022-05-08T10:39:02.3101929Z --- FAIL: TestPurge (20.03s)
2022-05-08T10:39:02.3102191Z === RUN   TestPurgeView
2022-05-08T10:39:02.3102498Z --- PASS: TestPurgeView (20.03s)
2022-05-08T10:39:02.3102881Z FAIL
2022-05-08T10:39:02.3103200Z FAIL	vitess.io/vitess/go/test/endtoend/tabletmanager/tablegc	122.041s
2022-05-08T10:39:02.3103469Z FAIL

converting to draft while checking it.

@shlomi-noach shlomi-noach marked this pull request as draft May 8, 2022 10:41
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
…unt where table exists, validate drop table

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach marked this pull request as ready for review May 8, 2022 18:48
@shlomi-noach
Copy link
Contributor Author

OK this has been successful for multiple successive runs. Maybe solved. Better than before, for sure!
TL;DR: reduced intervals, increased timeouts, allowing multiple states.

@shlomi-noach shlomi-noach requested a review from a team May 8, 2022 18:50
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! ❤️ This is another one that was on my flakes list. 🙂

I had an overall nit about using variables for the time.Sleep() invocations but I will defer to you on whether it's really warranted.

@@ -258,19 +299,11 @@ func TestEvac(t *testing.T) {
}

time.Sleep(10 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but can we make this a variable ~ createTableTimeStampOffsetSecs and then use that everywhere we are doing time.Sleep in relation to that variable? We can then use this throughout:

time.Sleep(createTableTimeStampOffsetSecs * time.Second)
time.Sleep((createTableTimeStampOffsetSecs / 2) * time.Second)
time.Sleep((createTableTimeStampOffsetSecs * 3) * time.Second)
...

This way it's easier to tie all the timing behavior together and adjust it holistically as needed by increasing that one variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All time values now parameterized

@@ -326,21 +351,7 @@ func TestPurge(t *testing.T) {
}

time.Sleep(15 * time.Second) // purgeReentranceInterval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar nit here. We could have a variable called purgeReentranceInterval so that those vars are all together and it's clear/easy to adjust the waits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

aaaand, I've made things worse.

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach
Copy link
Contributor Author

OK, remade things better

Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com>
@shlomi-noach shlomi-noach merged commit 4c80ca9 into vitessio:main May 9, 2022
@shlomi-noach shlomi-noach deleted the fix-tablegc-flaky-test branch May 9, 2022 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky TableGC tests
2 participants