-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve tasklist and implement history scavenger for SQL #4059
Conversation
Wow, thanks for the PR!!! These are lots of improvements. Btw, is this to replace #4019 ? |
This change picks up on that branch and includes that change, so this PR could be considered a replacement, yes. |
Yes, we have a very rough instructions about testing. Need more improvement, hehe
Totally acceptable. leaking is very slow, usually people already identify and fix it before they could be an issue. In fact it's quite hard to write the tests :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making such a big contribution. Really, really appreciate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above are mostly all my comments. This PR is very good work. Probably the best one from community so far.
cc @meiliang86
common/persistence/persistence-tests/visibilityPersistenceTest.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the contribution! Definitely one of the highest quality PRs from the open-source community and it requires a very deep understanding of Cadence internals. Thank you!
5fa5c78
to
e8cded5
Compare
838fd32
to
9dd7034
Compare
TaskList entries can be deleted out of the task_lists table before the corresponding tasks are drained from the tasks table. This leaves those tasks orphaned, and prior to this change nothing will delete them so they accumulate. This change enhances the tasklist scanner to watch for these orphans and remove them.
(sorry I have to edit something to hack around buildkite here)
9dd7034
to
d832cd5
Compare
There are deficiencies in the TaskList scanner (which only runs for SQL storage) that allow tasks to build up in the
tasks
table.task_lists
table while related entries remain in thetasks
table (no foreign key constraints.) In the case, the scavenger will never attempt to clean up those tasks.This changeset attempts to address these deficiencies in the following ways
In a similar vein, for SQL storage the history scanner does not run. This allows garbage to accumulate indefinitely into the
history_tree
andhistory_node
tables. Seemingly, the reason for this omission is that the GetAllHistoryBranches() query is unimplemented for SQL. This changeset implements that query, and then takes advantage of this to go ahead and enable the history scanner for SQL storage.Without this change, cruft will accumulate in the database indefinitely, increasing storage used and also increasing DB load and query latency.
These change are patched in from an operational fork of this project. The changes are working in operation, however the patch process may have introduced regressions, the post-patch code is much less well tested.
I ran the automated tests. Some tests that are likely important fail
If you're using Cassandra, nothing.
If using SQL, the scavenger could consume excessive resources, especially on the first run where there's an existing buildup of unmanaged garbage. It could delete tasks from the task table that should not be deleted. The history scanner similarly could consume excess resources, although the history scanner code is not new; it's identically the same code used for Cassandra. It's only newly enabled for SQL.