Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] For yb_backup checksums change to xxh64 from sha256 #8893

Closed
bmatican opened this issue Jun 14, 2021 · 4 comments
Closed

[docdb] For yb_backup checksums change to xxh64 from sha256 #8893

bmatican opened this issue Jun 14, 2021 · 4 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug kind/perf priority/medium Medium priority issue

Comments

@bmatican
Copy link
Contributor

bmatican commented Jun 14, 2021

Jira Link: DB-618
should be about 3x faster / less cpu intensive, just TBD if we have access to it by default, or we'd need Platform support to pre-package it somehow

cc @OlegLoginov

@bmatican bmatican added the area/docdb YugabyteDB core features label Jun 14, 2021
@bmatican bmatican self-assigned this Jun 14, 2021
@bmatican bmatican added this to Backlog in YBase features via automation Jun 14, 2021
@bmatican bmatican added this to To do in Master components via automation Jun 14, 2021
@bmatican bmatican added this to To do in Backups via automation Jun 14, 2021
@jaki
Copy link
Contributor

jaki commented Jun 14, 2021

Also consider xxhash or other fast, collision-resistant checksums, like murmurhash, which we already use in the codebase.

@OlegLoginov
Copy link
Contributor

@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 1, 2022
@OlegLoginov OlegLoginov added kind/enhancement This is an enhancement of an existing feature kind/perf and removed kind/bug This issue is a bug labels Jun 1, 2022
@OlegLoginov OlegLoginov changed the title [docdb] For yb_backup checksums change to crc32 from sha256 [docdb] For yb_backup checksums change to xxh64 from sha256 Jun 1, 2022
@OlegLoginov
Copy link
Contributor

Vars-07 added a commit that referenced this issue Jun 25, 2022
…instead of SHA256

Summary:
**For yb_backup checksums change  from sha256 to xxh64**
Changes done as part of the diff:
1) Install xxh64sum on new universe node & as part of upgrading existing nodes.
2) Compute checksum via xxh64sum if binaries present else fallback to sha256
3) Store checksum algorithm used in the manifest.json file.

Reference Doc:
https://docs.google.com/document/d/10I1n3WKlR2P_Oy-9BpAKVzR3xdg6aKteCHsEmC01d70/edit#

Test Plan:
Jenkins: all tests
Test cases to check:
Case 1: old backup - no Manifest
Case 2: in Manifest: 'hash-algorithm' is not available
Case 3: in Manifest: 'hash-algorithm' == 'sha256'
Case 4: in Manifest: 'hash-algorithm' == 'xxh64'

    xxh64sum is AVAILABLE [12 test-cases]

1.1.1-4 Cases 1-4 for: Manifest has no checksum file
1.2.1-4 Cases 1-4 for: Manifest has sha256 checksum file
1.3.1-4 Cases 1-4 for: Manifest has xxh64 checksum file

    xxh64sum is NOT available [12 test-cases]

2.1.1-4 Cases 1-4 for: Manifest has no checksum file
2.2.1-4 Cases 1-4 for: Manifest has sha256 checksum file
2.3.1-4 Cases 1-4 for: Manifest has xxh64 checksum file

    --disable_check_sum

TOTAL: 25 test cases for backup-restore

And 3 cases for backup-create:

    xxh64sum is - NOT available
    xxh64sum is - AVAILABLE
    --disable_check_sum (we should not check the tools in the mode)

Reviewers: vpatibandla, kkg, spotachev, oleg

Reviewed By: oleg

Subscribers: jenkins-bot

Differential Revision: https://phabricator.dev.yugabyte.com/D17257
Vars-07 added a commit that referenced this issue Jun 25, 2022
…instead of SHA256

Summary:
**For yb_backup checksums change  from sha256 to xxh64**
Changes done as part of the diff:
1) Install xxh64sum on new universe node & as part of upgrading existing nodes.
2) Compute checksum via xxh64sum if binaries present else fallback to sha256
3) Store checksum algorithm used in the manifest.json file.

Reference Doc:
https://docs.google.com/document/d/10I1n3WKlR2P_Oy-9BpAKVzR3xdg6aKteCHsEmC01d70/edit#

Test Plan:
Jenkins: all tests
Test cases to check:
Case 1: old backup - no Manifest
Case 2: in Manifest: 'hash-algorithm' is not available
Case 3: in Manifest: 'hash-algorithm' == 'sha256'
Case 4: in Manifest: 'hash-algorithm' == 'xxh64'

    xxh64sum is AVAILABLE [12 test-cases]

1.1.1-4 Cases 1-4 for: Manifest has no checksum file
1.2.1-4 Cases 1-4 for: Manifest has sha256 checksum file
1.3.1-4 Cases 1-4 for: Manifest has xxh64 checksum file

    xxh64sum is NOT available [12 test-cases]

2.1.1-4 Cases 1-4 for: Manifest has no checksum file
2.2.1-4 Cases 1-4 for: Manifest has sha256 checksum file
2.3.1-4 Cases 1-4 for: Manifest has xxh64 checksum file

    --disable_check_sum

TOTAL: 25 test cases for backup-restore

And 3 cases for backup-create:

    xxh64sum is - NOT available
    xxh64sum is - AVAILABLE
    --disable_check_sum (we should not check the tools in the mode)

Reviewers: vpatibandla, kkg, spotachev, oleg

Reviewed By: oleg

Subscribers: jenkins-bot

Differential Revision: https://phabricator.dev.yugabyte.com/D17257
@Vars-07
Copy link
Contributor

Vars-07 commented Jun 25, 2022

Resolved as part of this commit 2b8d01c

@Vars-07 Vars-07 closed this as completed Jun 25, 2022
YBase features automation moved this from Backlog to Done Jun 25, 2022
Master components automation moved this from To do to Done Jun 25, 2022
Backups automation moved this from To do to Done Jun 25, 2022
mbautin added a commit that referenced this issue Jun 26, 2022
…xh64sum instead of SHA256"

Summary: This reverts commit 2b8d01c ( https://phabricator.dev.yugabyte.com/D17257 ) because it breaks a lot of backup-related core database tests.

Test Plan: Jenkins: run all tests

Reviewers: svarshney, vpatibandla, kkg, spotachev, oleg, aagrawal

Reviewed By: aagrawal

Subscribers: jenkins-bot, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D17926
Vars-07 added a commit that referenced this issue Jul 8, 2022
…instead of SHA256

Summary:
**For yb_backup's change checksum computation algorithm to use xxh64 instead of sha256**
On a 67GB file, the xxh64sum vs. sha256 takes about the same amount of time (is that because of disk being the bottleneck), but CPU utilization is much lower (14% vs. 80%).

Fixes in the yb_backup script:
  - Check for xxh64sum binaries in `backup_table` for create & `load_or_create_manifest` for restore/delete backups in case checksum is not disabled.
  - If xxh64sum binaries exist, `xxh64sum_binary_present` flag is set.
  - New Backups will use `xxh64sum` if above flag is set.
  - For restore, we try to download `xxh64` checksum files if above flag is set, if this fails(in case of old backups) we try to download the sha256 checksum files, so as to support backward compatibility for old backups.
  - Persist `hash-algorithm` information for new backups in `Manifest.json`. If this is not present for a particular, then we assume that the backup used `sha256` for computing the checksum.

Reference Doc:
https://docs.google.com/document/d/10I1n3WKlR2P_Oy-9BpAKVzR3xdg6aKteCHsEmC01d70/edit#

Test Plan:
Jenkins: all tests
Test cases to check:
[Backup/Restore]
Case 1: old backup - no Manifest
Case 2: in Manifest: 'hash-algorithm' is not available
Case 3: in Manifest: 'hash-algorithm' == 'sha256'
Case 4: in Manifest: 'hash-algorithm' == 'xxh64'

    xxh64sum is AVAILABLE [12 test-cases]

1.1.1-4 Cases 1-4 for: Manifest has no checksum file
1.2.1-4 Cases 1-4 for: Manifest has sha256 checksum file
1.3.1-4 Cases 1-4 for: Manifest has xxh64 checksum file

    xxh64sum is NOT available [12 test-cases]

2.1.1-4 Cases 1-4 for: Manifest has no checksum file
2.2.1-4 Cases 1-4 for: Manifest has sha256 checksum file
2.3.1-4 Cases 1-4 for: Manifest has xxh64 checksum file

    --disable_check_sum

TOTAL: 25 test cases for backup-restore

And 3 cases for backup-create:

    xxh64sum is - NOT available
    xxh64sum is - AVAILABLE
    --disable_check_sum (we should not check the tools in the mode)\

[Package Installation]
As part of software upgrade `xxhash` must be installed on the exisiting db nodes.

Reviewers: vpatibandla, kkg, oleg, spotachev, muthu

Reviewed By: oleg, spotachev, muthu

Subscribers: kannan, yugaware, ybase, jenkins-bot

Differential Revision: https://phabricator.dev.yugabyte.com/D17257
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug and removed kind/enhancement This is an enhancement of an existing feature labels Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug kind/perf priority/medium Medium priority issue
Projects
Backups
  
Done
Development

No branches or pull requests

5 participants