Implement GetAllHistoryTreeBranches for SQL persistence backends #3438

MichaelSnowden · 2022-09-28T17:40:11Z

In this PR, I've added an implementation of the GetAllHistoryTreeBranches function for all of our SQL backends.

I made these changes for #3419. This change will allow the history scavenger to prune branches in clusters that use SQL backends for history storage.

There are existing test suites which verify that we can fetch all history branches after creating some. However, those tests were disabled for non-Cassandra databases. I removed that constraint, and watched the tests fail (because the GetAllHistoryTreeBranches method just panicked before this change). I then applied my fix, and watched the tests pass.

Worst case scenario, we somehow return the wrong data in GetAllHistoryTreeBranches, and that causes the scavenger to delete branches which it should not delete.

This change should not require a notification to the broader community, but there are some reactions on the description of the original issue, so we should notify them that this is now implemented.

MichaelSnowden · 2022-09-28T18:17:20Z

This just adds the functionality to fetch history tree branches, but it doesn't turn it on.

yiminc

remove test_0928180931_yng

MichaelSnowden · 2022-09-29T22:48:44Z

@dnr , I did some analysis on the different strategies we can use, and I'm now sure that this is the best one for us:

As demonstrated by this script, it's important that we do pagination without offsets, and instead do cursor-based
pagination. Also, it's important that we place columns in our WHERE and ORDER BY queries in the same order as they
are in the primary key index.

SET @num_rows := 10000;
DROP PROCEDURE IF EXISTS generate_fake_data;
CREATE PROCEDURE generate_fake_data()
BEGIN
    DECLARE i INT DEFAULT 0;
    # noinspection SqlWithoutWhere
    DELETE FROM history_tree;
    test_loop :
    LOOP
        IF (i = @num_rows) THEN
            LEAVE test_loop;
        END IF;
        INSERT INTO history_tree
        VALUES (RAND() * 1000, RANDOM_BYTES(16), RANDOM_BYTES(16), '', '');
        SET i = i + 1;
    END LOOP;
END;
CALL generate_fake_data();
SELECT COUNT(*)
FROM history_tree;

-- This is the offset-based approach. You can see that we are using the index, but we still have to
-- iterate over all rows before offset, so this becomes really inefficient towards the end.
EXPLAIN ANALYZE
SELECT shard_id, tree_id, branch_id, data, data_encoding
FROM history_tree
ORDER BY shard_id, tree_id, branch_id
-- OFFSET can't be a variable unfortunately--make sure it's something like half of num_rows
LIMIT 100 OFFSET 5000;

# -> Limit/Offset: 100/5000 row(s)  (cost=262.01 rows=100) (actual time=1.641..1.671 rows=100 loops=1)
#     -> Index scan on history_tree using PRIMARY  (cost=262.01 rows=5100) (actual time=0.020..1.530 rows=5100 loops=1)

-- This is the cursor-based approach. You can see that we are using the index and never scanning
-- too many rows.
EXPLAIN ANALYZE
SELECT shard_id, tree_id, branch_id, data, data_encoding
FROM history_tree
WHERE (shard_id, tree_id, branch_id) > (RAND(), RANDOM_BYTES(16), RANDOM_BYTES(16))
ORDER BY shard_id, tree_id, branch_id
LIMIT 100;

# -> Limit: 100 row(s)  (cost=0.14 rows=100) (actual time=0.026..0.063 rows=100 loops=1)
#     -> Filter: ((history_tree.shard_id,history_tree.tree_id,history_tree.branch_id) > (rand(),random_bytes(16),random_bytes(16)))  (cost=0.14 rows=100) (actual time=0.024..0.056 rows=100 loops=1)
#         -> Index scan on history_tree using PRIMARY  (cost=0.14 rows=100) (actual time=0.019..0.047 rows=103 loops=1)

-- Note that the order is important when using a cursor-based approach. As you can see here, if we switch tree_id and
-- branch_id, we need to do a table scan.
EXPLAIN ANALYZE
SELECT shard_id, branch_id, tree_id, data, data_encoding
FROM history_tree
WHERE (shard_id, branch_id, tree_id) > (RAND(), RANDOM_BYTES(16), RANDOM_BYTES(16))
ORDER BY shard_id, branch_id, tree_id
LIMIT 100;

# -> Limit: 100 row(s)  (cost=1003.75 rows=100) (actual time=4.410..4.486 rows=100 loops=1)
#     -> Sort row IDs: history_tree.shard_id, history_tree.branch_id, history_tree.tree_id, limit input to 100 row(s) per chunk  (cost=1003.75 rows=10000) (actual time=4.409..4.481 rows=100 loops=1)
#         -> Filter: ((history_tree.shard_id,history_tree.branch_id,history_tree.tree_id) > (rand(),random_bytes(16),random_bytes(16)))  (cost=1003.75 rows=10000) (actual time=0.024..3.333 rows=9997 loops=1)
#             -> Table scan on history_tree  (cost=1003.75 rows=10000) (actual time=0.019..2.566 rows=10000 loops=1)

common/persistence/sql/sqlplugin/history_tree.go

common/persistence/persistenceInterface.go

common/persistence/sql/history_store.go

common/persistence/sql/sqlplugin/mysql/events.go

common/persistence/sql/sqlplugin/postgresql/events.go

common/persistence/sql/sqlplugin/sqlite/events.go

MichaelSnowden marked this pull request as ready for review September 28, 2022 18:17

MichaelSnowden requested a review from a team as a code owner September 28, 2022 18:17

MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch 2 times, most recently from c2b5171 to 3840941 Compare September 28, 2022 21:39

yiminc approved these changes Sep 28, 2022

View reviewed changes

MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch from 3840941 to 026de47 Compare September 29, 2022 22:41

dnr reviewed Sep 29, 2022

View reviewed changes

common/persistence/sql/sqlplugin/history_tree.go Outdated Show resolved Hide resolved

MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch from 026de47 to 668c364 Compare September 29, 2022 23:14

dnr approved these changes Sep 29, 2022

View reviewed changes

Implement GetAllHistoryTreeBranches for SQL persistence backends

8ad9f56

MichaelSnowden force-pushed the 3419-need-history-scavenger-for-sql-based-persistence branch from 668c364 to 8ad9f56 Compare September 29, 2022 23:38

MichaelSnowden enabled auto-merge (squash) September 29, 2022 23:39

MichaelSnowden merged commit 3815915 into temporalio:master Sep 30, 2022

MichaelSnowden deleted the 3419-need-history-scavenger-for-sql-based-persistence branch September 30, 2022 18:49

dnr pushed a commit that referenced this pull request Oct 18, 2022

Implement GetAllHistoryTreeBranches for SQL persistence backends (#3438)

1b8e28f

dnr added the release/1.18.2 label Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GetAllHistoryTreeBranches for SQL persistence backends #3438

Implement GetAllHistoryTreeBranches for SQL persistence backends #3438

MichaelSnowden commented Sep 28, 2022

MichaelSnowden commented Sep 28, 2022

yiminc left a comment

MichaelSnowden commented Sep 29, 2022

Implement GetAllHistoryTreeBranches for SQL persistence backends #3438

Implement GetAllHistoryTreeBranches for SQL persistence backends #3438

Conversation

MichaelSnowden commented Sep 28, 2022

MichaelSnowden commented Sep 28, 2022

yiminc left a comment

Choose a reason for hiding this comment

MichaelSnowden commented Sep 29, 2022