-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Platform] Sending backups for Platform HA seems to be erroring out for replicated-based installs #7488
Comments
yes most likely renameTo is failing |
It may be because its trying to move across volumes.
Looks like i need to use Files.move to make it full-proof. |
This also explains why the issue showed up only when using docker but not on yugabundle install. |
Summary: Switch to using Files.move which promises to be system independent. Earlier we were using File.renameTo which does not work when source and destination file paths are on two differnt volumes. This bug shows up when we are running in a container with `/tmp` folder and `/opt/yugabyte/yugaware/data` on two different volumes. Test Plan: Unit tested with existing test that backup copying has not broken Reviewers: daniel Reviewed By: daniel Subscribers: yugaware Differential Revision: https://phabricator.dev.yugabyte.com/D10804
… behavior Summary: Switch to using Files.move which promises to be system independent. Earlier we were using File.renameTo which does not work when source and destination file paths are on two differnt volumes. This bug shows up when we are running in a container with `/tmp` folder and `/opt/yugabyte/yugaware/data` on two different volumes. Test Plan: Unit tested with existing test that backup copying has not broken Reviewers: daniel Reviewed By: daniel Subscribers: yugaware Differential Revision: https://phabricator.dev.yugabyte.com/D10804
Seeing
2021-03-04 21:00:15.304 ERROR PlatformReplicationManager.java:368 [application-akka.actor.default-dispatcher-121] Could not store platform backup received from leader http://<REDACTED> via http://<REDACTED> as /opt/yugabyte/yugaware/data/platformReplication/<REDACTED>/backup_21-03-04-21-00.tgz 2021-03-04 21:00:15.304 ERROR ApiResponse.java:21 [application-akka.actor.default-dispatcher-121] Hit error 500, message: {"error":"failed to copy backup"}
consistently. Error here looks to be coming fromuploadedFile.renameTo(saveAsFile.toFile())
failing.Also it looks like in PlatformInstanceClient::syncBackups we are not throwing an exception if the syncBackups request doesn't return a 200. This causes downstream issues of stuff like the last backup timestamp being updated when it should not be in this case b/c we failed to send the backup. If the API response contains an "error" key, we should be throwing an error in this method for everything downstream to work correctly (see PlatformInstanceClient::makeRequest for an example of parsing the response for an error + throwing an exception if one exists).
The text was updated successfully, but these errors were encountered: