Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GetAllHistoryTreeBranches for SQL persistence backends #3438

Conversation

MichaelSnowden
Copy link
Contributor

In this PR, I've added an implementation of the GetAllHistoryTreeBranches function for all of our SQL backends.

I made these changes for #3419. This change will allow the history scavenger to prune branches in clusters that use SQL backends for history storage.

There are existing test suites which verify that we can fetch all history branches after creating some. However, those tests were disabled for non-Cassandra databases. I removed that constraint, and watched the tests fail (because the GetAllHistoryTreeBranches method just panicked before this change). I then applied my fix, and watched the tests pass.

Worst case scenario, we somehow return the wrong data in GetAllHistoryTreeBranches, and that causes the scavenger to delete branches which it should not delete.

This change should not require a notification to the broader community, but there are some reactions on the description of the original issue, so we should notify them that this is now implemented.

@MichaelSnowden
Copy link
Contributor Author

This just adds the functionality to fetch history tree branches, but it doesn't turn it on.

@MichaelSnowden MichaelSnowden marked this pull request as ready for review September 28, 2022 18:17
@MichaelSnowden MichaelSnowden requested a review from a team as a code owner September 28, 2022 18:17
@MichaelSnowden MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch 2 times, most recently from c2b5171 to 3840941 Compare September 28, 2022 21:39
Copy link
Member

@yiminc yiminc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelSnowden MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch from 3840941 to 026de47 Compare September 29, 2022 22:41
@MichaelSnowden
Copy link
Contributor Author

@dnr , I did some analysis on the different strategies we can use, and I'm now sure that this is the best one for us:

As demonstrated by this script, it's important that we do pagination without offsets, and instead do cursor-based
pagination. Also, it's important that we place columns in our WHERE and ORDER BY queries in the same order as they
are in the primary key index.

SET @num_rows := 10000;
DROP PROCEDURE IF EXISTS generate_fake_data;
CREATE PROCEDURE generate_fake_data()
BEGIN
    DECLARE i INT DEFAULT 0;
    # noinspection SqlWithoutWhere
    DELETE FROM history_tree;
    test_loop :
    LOOP
        IF (i = @num_rows) THEN
            LEAVE test_loop;
        END IF;
        INSERT INTO history_tree
        VALUES (RAND() * 1000, RANDOM_BYTES(16), RANDOM_BYTES(16), '', '');
        SET i = i + 1;
    END LOOP;
END;
CALL generate_fake_data();
SELECT COUNT(*)
FROM history_tree;

-- This is the offset-based approach. You can see that we are using the index, but we still have to
-- iterate over all rows before offset, so this becomes really inefficient towards the end.
EXPLAIN ANALYZE
SELECT shard_id, tree_id, branch_id, data, data_encoding
FROM history_tree
ORDER BY shard_id, tree_id, branch_id
-- OFFSET can't be a variable unfortunately--make sure it's something like half of num_rows
LIMIT 100 OFFSET 5000;

# -> Limit/Offset: 100/5000 row(s)  (cost=262.01 rows=100) (actual time=1.641..1.671 rows=100 loops=1)
#     -> Index scan on history_tree using PRIMARY  (cost=262.01 rows=5100) (actual time=0.020..1.530 rows=5100 loops=1)

-- This is the cursor-based approach. You can see that we are using the index and never scanning
-- too many rows.
EXPLAIN ANALYZE
SELECT shard_id, tree_id, branch_id, data, data_encoding
FROM history_tree
WHERE (shard_id, tree_id, branch_id) > (RAND(), RANDOM_BYTES(16), RANDOM_BYTES(16))
ORDER BY shard_id, tree_id, branch_id
LIMIT 100;

# -> Limit: 100 row(s)  (cost=0.14 rows=100) (actual time=0.026..0.063 rows=100 loops=1)
#     -> Filter: ((history_tree.shard_id,history_tree.tree_id,history_tree.branch_id) > (rand(),random_bytes(16),random_bytes(16)))  (cost=0.14 rows=100) (actual time=0.024..0.056 rows=100 loops=1)
#         -> Index scan on history_tree using PRIMARY  (cost=0.14 rows=100) (actual time=0.019..0.047 rows=103 loops=1)

-- Note that the order is important when using a cursor-based approach. As you can see here, if we switch tree_id and
-- branch_id, we need to do a table scan.
EXPLAIN ANALYZE
SELECT shard_id, branch_id, tree_id, data, data_encoding
FROM history_tree
WHERE (shard_id, branch_id, tree_id) > (RAND(), RANDOM_BYTES(16), RANDOM_BYTES(16))
ORDER BY shard_id, branch_id, tree_id
LIMIT 100;

# -> Limit: 100 row(s)  (cost=1003.75 rows=100) (actual time=4.410..4.486 rows=100 loops=1)
#     -> Sort row IDs: history_tree.shard_id, history_tree.branch_id, history_tree.tree_id, limit input to 100 row(s) per chunk  (cost=1003.75 rows=10000) (actual time=4.409..4.481 rows=100 loops=1)
#         -> Filter: ((history_tree.shard_id,history_tree.branch_id,history_tree.tree_id) > (rand(),random_bytes(16),random_bytes(16)))  (cost=1003.75 rows=10000) (actual time=0.024..3.333 rows=9997 loops=1)
#             -> Table scan on history_tree  (cost=1003.75 rows=10000) (actual time=0.019..2.566 rows=10000 loops=1)

@MichaelSnowden MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch from 026de47 to 668c364 Compare September 29, 2022 23:14
common/persistence/persistenceInterface.go Show resolved Hide resolved
common/persistence/sql/history_store.go Show resolved Hide resolved
common/persistence/sql/sqlplugin/mysql/events.go Outdated Show resolved Hide resolved
common/persistence/sql/sqlplugin/postgresql/events.go Outdated Show resolved Hide resolved
common/persistence/sql/sqlplugin/sqlite/events.go Outdated Show resolved Hide resolved
@MichaelSnowden MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch from 668c364 to 8ad9f56 Compare September 29, 2022 23:38
@MichaelSnowden MichaelSnowden enabled auto-merge (squash) September 29, 2022 23:39
@MichaelSnowden MichaelSnowden merged commit 3815915 into temporalio:master Sep 30, 2022
@MichaelSnowden MichaelSnowden deleted the 3419-need-history-scavenger-for-sql-based-persistence branch September 30, 2022 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants