[WIP] Sanity checking around backing up files #395

kiddom-kq · 2021-08-24T17:59:39Z

While trying to deploy Medusa, i ran into a few issues. The root cause of #390 was an issue file systems permissions and the silent-failure property of python's glob()

This PR implements some of the debug logging that I wish I had had while troubleshooting and a basic sanity check to abort execution as soon as an error is observed rather than waiting for a (misleading) failure at a later point in execution.

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1398
┆priority: Medium

During node backup, the python process running as the `medusa` user was unable to locate snapshot files on disk. If no plausible directories with snapshot files exist on the host, consider this a failure condition and abort. Do not let execution proceed erroneously leading to a "backup successful cess" message.

…me for a given node

kiddom-kq · 2021-08-24T18:23:48Z

Would ValueError be more appropriate than the base Exception that's being raised? I didn't see too many custom exception types in the Medusa code base and of the few, none really fit this case. There are several places in the code bases where raise Exception(error_msg) is used so i'm not sure if there's some lint-disable comment i can leave there to prevent SonarCloud from objecting.

sonarcloud · 2021-08-30T16:59:02Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

adejanovski · 2021-09-02T09:16:16Z

@kiddom-kq, it looks like this broke the integration tests. You can run them locally using ./run_integration_tests.sh, with a prereq to have ccm installed (pip install ccm).
Let me know if you need some assistance to debug this.

adejanovski · 2021-09-06T09:03:48Z

Would ValueError be more appropriate than the base Exception that's being raised? I didn't see too many custom exception types in the Medusa code base and of the few, none really fit this case. There are several places in the code bases where raise Exception(error_msg) is used so i'm not sure if there's some lint-disable comment i can leave there to prevent SonarCloud from objecting.

Any improvement getting a more specific use of exceptions is welcome. Feel free to make any change necessary.

kiddom-kq · 2021-09-07T17:55:51Z

@adejanovski

I am having some trouble understanding the test.

 ./run_integration_tests.sh --test=16  --cassandra-version=2.2.19 -vv

That returns a failure because the assert statement is thrown.

What is supposed to happen when find_dirs() has no dirs?

Looking at the test:

    Scenario Outline: Perform a differential backup over gRPC , verify its index, then delete it over gRPC with Jolokia
        Given I have a fresh ccm cluster with jolokia "<client encryption>" running named "scenario16"
        Given I am using "<storage>" as storage provider in ccm cluster "<client encryption>" with gRPC server
        Then the gRPC server is up
        When I create the "test" table in keyspace "medusa"
        When I load 100 rows in the "medusa.test" table
        When I run a "ccm node1 nodetool flush" command
        When I perform a backup over gRPC in "differential" mode of the node named "grpc_backup"
        Then the backup index exists
        Then I verify over gRPC that the backup "grpc_backup" exists
        Then I can see the backup index entry for "grpc_backup"
        Then I can see the latest backup for "127.0.0.1" being called "grpc_backup"
        Then I delete the backup "grpc_backup" over gRPC
        Then I verify over gRPC the backup "grpc_backup" does not exist
        Then I shutdown the gRPC server

It looks like When I perform a backup over gRPC in "differential" mode of the node named "grpc_backup" is the first time that a backup is taken of the scenario16 cluster. If that is correct, how can a table with 100 rows in it have no directories that need to be backed up?

When I disable the assert, the execution continues to call add_backup_finish_to_index(storage, node_backup) and then set_latest_backup_in_index(storage, node_backup)

Those functions put a record of the backup happening in the storage/index... even though nothing was backed up.
The next line of the test: Then the backup index exists would see that something was added to the index and the test continues on happy to delete the (empty) backup that it just made.

Can you confirm that the differential backup should return 0 directories?

adejanovski · 2021-10-07T13:59:17Z

@kiddom-kq, differential backups put the sstables into a data directory that's at the same level than the backup directories. So you should have something like this:

While full backups use this layout:

Those 100 rows should definitely be backed up in the data directory.

adejanovski · 2022-03-25T16:42:10Z

Hi @kiddom-kq, is this still something you're working on?

rzvoncek · 2024-06-17T12:48:32Z

ping @kiddom-kq

kiddom-kq added 2 commits August 23, 2021 14:56

Implement basic sanity checking when trying to get the last backup na…

fe479e8

…me for a given node

kiddom-kq and others added 3 commits August 24, 2021 11:27

satisfy SonarCloud

3b13060

satisfy SonarCloud

4bb487e

Merge branch 'master' into quinsland/empty_backup_dirs

101841a

adejanovski added stale zh:Assess/Investigate and removed zh:Assess/Investigate labels Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Sanity checking around backing up files #395

[WIP] Sanity checking around backing up files #395

kiddom-kq commented Aug 24, 2021 •

edited by sync-by-unito bot

Loading

kiddom-kq commented Aug 24, 2021

sonarcloud bot commented Aug 30, 2021

adejanovski commented Sep 2, 2021

adejanovski commented Sep 6, 2021

kiddom-kq commented Sep 7, 2021

adejanovski commented Oct 7, 2021

adejanovski commented Mar 25, 2022

rzvoncek commented Jun 17, 2024

[WIP] Sanity checking around backing up files #395

Are you sure you want to change the base?

[WIP] Sanity checking around backing up files #395

Conversation

kiddom-kq commented Aug 24, 2021 • edited by sync-by-unito bot Loading

kiddom-kq commented Aug 24, 2021

sonarcloud bot commented Aug 30, 2021

adejanovski commented Sep 2, 2021

adejanovski commented Sep 6, 2021

kiddom-kq commented Sep 7, 2021

adejanovski commented Oct 7, 2021

adejanovski commented Mar 25, 2022

rzvoncek commented Jun 17, 2024

kiddom-kq commented Aug 24, 2021 •

edited by sync-by-unito bot

Loading