Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Platform][Backup] Backup is failing in itests #10907

Closed
yorq opened this issue Dec 17, 2021 · 1 comment
Closed

[Platform][Backup] Backup is failing in itests #10907

yorq opened this issue Dec 17, 2021 · 1 comment
Assignees
Labels
area/platform Yugabyte Platform kind/bug This issue is a bug

Comments

@yorq
Copy link
Contributor

yorq commented Dec 17, 2021

Description

Multiple tests are failing due to errors in backup functionality
Examples:

  1. https://jenkins.dev.yugabyte.com/view/Test%20Jobs/job/itest-developer/2809/ - 2.11x build
  2. https://jenkins.dev.yugabyte.com/view/Test%20Jobs/job/itest-developer/2828/ - 2.6x build

Error looks like
2021-12-17 12:53:54,319 ERROR: Failed to run command [[ scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /opt/yugabyte/yugaware/data/keys/0c53ead8-5a21-4982-81f8-d1ee0a4c709d/yb-itest-7b9dd2c6a4-20211217-123309_0c53ead8-5a21-4982-81f8-d1ee0a4c709d-key.pem -P 54422 -q /tmp/yb_backup_rflogzfcfxqstdgz/cloud_cfg yugabyte@10.9.203.45:/tmp/yb_backup_rflogzfcfxqstdgz ]]: code=1 output=scp: /tmp/yb_backup_rflogzfcfxqstdgz/cloud_cfg: Permission denied

It happens probably because two upload processes are running simultaneously
2021-12-17 12:53:53,158 INFO: Uploading /tmp/yb_backup_rflogzfcfxqstdgz/cloud_cfg to server 10.9.203.45 2021-12-17 12:53:53,538 INFO: Uploading /tmp/yb_backup_rflogzfcfxqstdgz/cloud_cfg to server 10.9.203.45

@streddy-yb
Copy link
Contributor

hi @OlegLoginov - Can you do the 2.8 backport asap? 2.8.1 release is going out later today. thanks

OlegLoginov added a commit that referenced this issue Dec 21, 2021
Summary:
The previous fix changed `YBBackup.find_data_dirs` method:
Commit: d6b5658
Diff: https://phabricator.dev.yugabyte.com/D14323

In old implementation the method called `run_ssh_cmd()` (to call `egrep` on every TS). The `run_ssh_cmd()` runs
implicitly `upload_cloud_config()` for every TS.
So, after the fix the config uploading is called on the next step - in `find_snapshot_directories()`, but the method
is called in parallel for every data dir, so if we have 2 dirs on a TS, the `upload_cloud_config()` will be implicitly
called twice. (And the second uploading should fail because the config file is already available on the remote node.)

The fix explicitly calls `upload_cloud_config()` to prevent race on the call below from multiple parallel calls of
`find_snapshot_directories()`.

Test Plan:
ybd --cxx-test tools_yb-backup-test_ent
ybd --java-test org.yb.pgsql.TestYbBackup --tp 1
ybd --java-test org.yb.cql.TestYbBackup --tp 1
ybd --java-test org.yb.cql.ParameterizedTestYbBackup --tp 1

Reviewers: mihnea, achauhan

Reviewed By: achauhan

Subscribers: jenkins-bot, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D14441
OlegLoginov added a commit that referenced this issue Dec 21, 2021
…uploading.

Summary:
The previous fix changed `YBBackup.find_data_dirs` method:
Commit: d6b5658
Diff: https://phabricator.dev.yugabyte.com/D14323

In old implementation the method called `run_ssh_cmd()` (to call `egrep` on every TS). The `run_ssh_cmd()` runs
implicitly `upload_cloud_config()` for every TS.
So, after the fix the config uploading is called on the next step - in `find_snapshot_directories()`, but the method
is called in parallel for every data dir, so if we have 2 dirs on a TS, the `upload_cloud_config()` will be implicitly
called twice. (And the second uploading should fail because the config file is already available on the remote node.)

The fix explicitly calls `upload_cloud_config()` to prevent race on the call below from multiple parallel calls of
`find_snapshot_directories()`.

Original commit: 6ca4b2d
Original diff: https://phabricator.dev.yugabyte.com/D14441

Test Plan:
Jenkins: rebase: 2.8, hot

ybd --cxx-test tools_yb-backup-test_ent
ybd --java-test org.yb.pgsql.TestYbBackup --tp 1
ybd --java-test org.yb.cql.TestYbBackup --tp 1
ybd --java-test org.yb.cql.ParameterizedTestYbBackup --tp 1

Reviewers: mihnea, achauhan

Reviewed By: achauhan

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D14457
OlegLoginov added a commit that referenced this issue Dec 22, 2021
…uploading.

Summary:
The previous fix changed `YBBackup.find_data_dirs` method:
Commit: d6b5658
Diff: https://phabricator.dev.yugabyte.com/D14323

In old implementation the method called `run_ssh_cmd()` (to call `egrep` on every TS). The `run_ssh_cmd()` runs
implicitly `upload_cloud_config()` for every TS.
So, after the fix the config uploading is called on the next step - in `find_snapshot_directories()`, but the method
is called in parallel for every data dir, so if we have 2 dirs on a TS, the `upload_cloud_config()` will be implicitly
called twice. (And the second uploading should fail because the config file is already available on the remote node.)

The fix explicitly calls `upload_cloud_config()` to prevent race on the call below from multiple parallel calls of
`find_snapshot_directories()`.

Original commit: 6ca4b2d
Original diff: https://phabricator.dev.yugabyte.com/D14441

Test Plan:
Jenkins: rebase: 2.6, hot

ybd --cxx-test tools_yb-backup-test_ent
ybd --java-test org.yb.pgsql.TestYbBackup --tp 1
ybd --java-test org.yb.cql.TestYbBackup --tp 1

Reviewers: mihnea, achauhan

Reviewed By: achauhan

Subscribers: jenkins-bot, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D14458
@OlegLoginov OlegLoginov added the kind/bug This issue is a bug label Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform Yugabyte Platform kind/bug This issue is a bug
Projects
None yet
Development

No branches or pull requests

3 participants