Fix Catalog corruption issue when running CLI in parallel #472

anujc25 · 2023-08-24T16:57:48Z

What this PR does / why we need it

Fix the Catalog corruption issue when running CLI in parallel
Implements NewContextCatalog (for the reading) and NewContextCatalogUpdater (for reading/writing) as separate APIs.
Uses a new lockedfile API to lock the file. Inspired from: https://go.googlesource.com/proposal/+/master/design/33974-add-public-lockedfile-pkg.md

Which issue(s) this PR fixes

Fixes #471

Describe testing done for PR

Wrote a bash script to manually verify the fix

#!/bin/bash

TANZU=tanzu
$TANZU plugin install --group vmware-tkg/default
for i in {1..10}
do
    for x in {1..50}
    do
        $TANZU version &
        echo "running: $x $!"        
    done
    echo "waiting"
    wait
    echo "done"    
done

$TANZU plugin list

Also added E2E tests to run tanzu commands in parallel when 2 telemetry plugins are present
Also added E2E tests to install multiple Plugins in parallel
User can verify the E2E test by running tests with latest tanzu binary with the fix and the old `tanzu binary without the fix as below:

cd test/e2e/

export TANZU_CLI_E2E_TEST_BINARY_PATH=tanzu-old ## Without the fix
make e2e-catalog-tests. ## This should fail


export TANZU_CLI_E2E_TEST_BINARY_PATH=tanzu-new ## With the fix
make e2e-catalog-tests ## This should be successful

Release note

Fix the Catalog corruption issue (missing installed plugins) when running CLI in parallel

Additional information

Special notes for your reviewer

marckhouzam

This look very nice to me.
There are, I think, some important comments to address, but this is a very clean solution.
Thanks @anujc25 !

pkg/catalog/catalog.go

pkg/catalog/cleanup.go

pkg/pluginmanager/manager.go

vuil · 2023-08-30T17:15:08Z

Do we have any e2e or integ tests (or doing multiple concurrent catalog reads/writes) that can help uncover the issue we are trying to fix here? Having them earlier would have help catch the issue, having them would have helped giving confidence that the issue is sufficiently addressed. Let's make sure we at least file a TODO to provide them.

pkg/catalog/catalog.go

anujc25 · 2023-08-31T16:02:37Z

Do we have any e2e or integ tests (or doing multiple concurrent catalog reads/writes) that can help uncover the issue we are trying to fix here? Having them earlier would have help catch the issue, having them would have helped giving confidence that the issue is sufficiently addressed. Let's make sure we at least file a TODO to provide them.

I am planning to add e2e tests for this. I wanted to get high-level feedback on the approach before I deep dive into writing tests for this PR and hence a draft PR. If the approach looks good, I will add some e2e tests for this change.

vuil · 2023-08-31T18:08:13Z

Do we have any e2e or integ tests (or doing multiple concurrent catalog reads/writes) that can help uncover the issue we are trying to fix here? Having them earlier would have help catch the issue, having them would have helped giving confidence that the issue is sufficiently addressed. Let's make sure we at least file a TODO to provide them.

I am planning to add e2e tests for this. I wanted to get high-level feedback on the approach before I deep dive into writing tests for this PR and hence a draft PR. If the approach looks good, I will add some e2e tests for this change.

sgtm. The not-so-great part is an expectation that any valid unlock function returned has to be called by the caller, but the approach is a reasonable compromise.

pkg/catalog/cleanup.go

pkg/pluginmanager/manager.go

pkg/catalog/catalog.go

marckhouzam

Nice effort. This is complicated stuff.
I haven't looked at the E2E tests yet because more concurrency questions came to my mind.

I'm wondering, in saveCatalogCache() do we need to use the lockedFile file descriptor to write to the file? I think doing that would make unit tests proper fail if Unlock() is called too early (which we had before but you corrected in your latest changes), because a call to Upsert() after the Unlock() would not be able to write to the closed file.

pkg/catalog/catalog.go

If the cc.unlock is `nil` consider it as catalog has been unlocked already and throw an meaningful error when running Upsert/Delete calls on the unlocked catalog

marckhouzam

LGTM
Thanks for this complicated and important change!

vuil

Thanks for the updates and improvements after the reviews.
There is a nit a typo in the comments, but changes lgtm.

pkg/catalog/catalog.go

Fix Catalog corruption issue when running CLI in parallel

aa52f5f

vmwclabot added the cla-not-required label Aug 24, 2023

marckhouzam reviewed Aug 25, 2023

View reviewed changes

Address comments

cbfc837

anujc25 force-pushed the fix-parallel-run-catalog-corruption branch from 6f2c2e7 to cbfc837 Compare August 29, 2023 14:12

vuil reviewed Aug 30, 2023

View reviewed changes

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

marckhouzam reviewed Sep 7, 2023

View reviewed changes

pkg/catalog/cleanup.go Outdated Show resolved Hide resolved

pkg/pluginmanager/manager.go Show resolved Hide resolved

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

anujc25 added 4 commits September 7, 2023 11:20

Regenerated the test central repos

ae38336

Add E2E tests for catalog updates in parallel

557afa4

Address comments

51f30da

Run e2e-catalog-tests as part of CI

aa9360d

anujc25 force-pushed the fix-parallel-run-catalog-corruption branch from e7dd402 to aa9360d Compare September 7, 2023 16:17

anujc25 added 2 commits September 7, 2023 12:29

Fix linter errors

66ff376

Fix e2e tests

613b7d9

anujc25 force-pushed the fix-parallel-run-catalog-corruption branch from 8292dce to 6df47bb Compare September 8, 2023 04:46

anujc25 added 2 commits September 8, 2023 01:05

Fix unit tests

2c58873

Fix linter error

a99b6ca

anujc25 force-pushed the fix-parallel-run-catalog-corruption branch from 6df47bb to a99b6ca Compare September 8, 2023 05:09

anujc25 commented Sep 8, 2023

View reviewed changes

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

anujc25 marked this pull request as ready for review September 8, 2023 05:11

anujc25 requested a review from a team as a code owner September 8, 2023 05:11

marckhouzam reviewed Sep 8, 2023

View reviewed changes

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

pkg/catalog/catalog.go Show resolved Hide resolved

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

anujc25 added 3 commits September 8, 2023 11:30

Address comments 2

2d2ab5a

Use unlock function to verify if the catalog is locked or not

e3593eb

If the cc.unlock is `nil` consider it as catalog has been unlocked already and throw an meaningful error when running Upsert/Delete calls on the unlocked catalog

Create new catalog only if file does-not-exists error

e33ecc1

anujc25 force-pushed the fix-parallel-run-catalog-corruption branch 2 times, most recently from abfd39e to 75c2b96 Compare September 11, 2023 22:18

marckhouzam approved these changes Sep 15, 2023

View reviewed changes

vuil approved these changes Sep 15, 2023

View reviewed changes

pkg/catalog/catalog.go Outdated Show resolved Hide resolved

Use lockedFile to saveCatalogCache

118ba57

anujc25 force-pushed the fix-parallel-run-catalog-corruption branch from 75c2b96 to 118ba57 Compare September 15, 2023 22:59

anujc25 merged commit e1eaa9a into vmware-tanzu:main Sep 15, 2023
4 checks passed

marckhouzam added this to the 1.1.0 milestone Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Catalog corruption issue when running CLI in parallel #472

Fix Catalog corruption issue when running CLI in parallel #472

anujc25 commented Aug 24, 2023 •

edited

marckhouzam left a comment

vuil commented Aug 30, 2023

anujc25 commented Aug 31, 2023

vuil commented Aug 31, 2023 •

edited

marckhouzam left a comment

marckhouzam left a comment

vuil left a comment

Fix Catalog corruption issue when running CLI in parallel #472

Fix Catalog corruption issue when running CLI in parallel #472

Conversation

anujc25 commented Aug 24, 2023 • edited

What this PR does / why we need it

Which issue(s) this PR fixes

Describe testing done for PR

Release note

Additional information

Special notes for your reviewer

marckhouzam left a comment

Choose a reason for hiding this comment

vuil commented Aug 30, 2023

anujc25 commented Aug 31, 2023

vuil commented Aug 31, 2023 • edited

marckhouzam left a comment

Choose a reason for hiding this comment

marckhouzam left a comment

Choose a reason for hiding this comment

vuil left a comment

Choose a reason for hiding this comment

anujc25 commented Aug 24, 2023 •

edited

vuil commented Aug 31, 2023 •

edited