Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Platform] Sending backups for Platform HA seems to be erroring out for replicated-based installs #7488

Open
daniel-yb opened this issue Mar 4, 2021 · 3 comments
Assignees
Labels
area/platform Yugabyte Platform priority/high High Priority
Projects

Comments

@daniel-yb
Copy link
Contributor

daniel-yb commented Mar 4, 2021

Seeing 2021-03-04 21:00:15.304 ERROR PlatformReplicationManager.java:368 [application-akka.actor.default-dispatcher-121] Could not store platform backup received from leader http://<REDACTED> via http://<REDACTED> as /opt/yugabyte/yugaware/data/platformReplication/<REDACTED>/backup_21-03-04-21-00.tgz 2021-03-04 21:00:15.304 ERROR ApiResponse.java:21 [application-akka.actor.default-dispatcher-121] Hit error 500, message: {"error":"failed to copy backup"} consistently. Error here looks to be coming from uploadedFile.renameTo(saveAsFile.toFile()) failing.

Also it looks like in PlatformInstanceClient::syncBackups we are not throwing an exception if the syncBackups request doesn't return a 200. This causes downstream issues of stuff like the last backup timestamp being updated when it should not be in this case b/c we failed to send the backup. If the API response contains an "error" key, we should be throwing an error in this method for everything downstream to work correctly (see PlatformInstanceClient::makeRequest for an example of parsing the response for an error + throwing an exception if one exists).

@daniel-yb daniel-yb added the area/platform Yugabyte Platform label Mar 4, 2021
@daniel-yb daniel-yb added this to To do in Platform Mar 4, 2021
@daniel-yb daniel-yb added the priority/high High Priority label Mar 4, 2021
@sb-yb
Copy link
Contributor

sb-yb commented Mar 5, 2021

yes most likely renameTo is failing
bash-4.4# ls -l /tmp/playtemp5402282742644965262/ total 1040 -rw------- 1 root root 52828 Mar 5 02:17 multipartBody1191487410474946536asTemporaryFile -rw------- 1 root root 52824 Mar 5 02:29 multipartBody1349878804314686537asTemporaryFile -rw------- 1 root root 52824 Mar 5 02:21 multipartBody2715382633653750760asTemporaryFile -rw------- 1 root root 52824 Mar 5 02:22 multipartBody2909114316643063311asTemporaryFile -rw------- 1 root root 52827 Mar 5 02:25 multipartBody2923016595980151741asTemporaryFile -rw------- 1 root root 52821 Mar 5 02:33 multipartBody3211945310272948446asTemporaryFile -rw------- 1 root root 52820 Mar 5 02:16 multipartBody33577914806112641asTemporaryFile -rw------- 1 root root 52823 Mar 5 02:18 multipartBody4068333173256035068asTemporaryFile -rw------- 1 root root 52822 Mar 5 02:20 multipartBody4344251046618232564asTemporaryFile -rw------- 1 root root 52823 Mar 5 02:24 multipartBody4548917454030738666asTemporaryFile -rw------- 1 root root 52826 Mar 5 02:27 multipartBody4689643039031847131asTemporaryFile -rw------- 1 root root 52823 Mar 5 02:19 multipartBody4948385827406685368asTemporaryFile -rw------- 1 root root 52821 Mar 5 02:31 multipartBody5802608604294979098asTemporaryFile -rw------- 1 root root 52822 Mar 5 02:32 multipartBody581028241425724123asTemporaryFile -rw------- 1 root root 52823 Mar 5 02:34 multipartBody595672917708333885asTemporaryFile -rw------- 1 root root 52823 Mar 5 02:28 multipartBody6790848905692799229asTemporaryFile -rw------- 1 root root 52824 Mar 5 02:35 multipartBody6796472099889230412asTemporaryFile -rw------- 1 root root 52822 Mar 5 02:23 multipartBody7592960244953046317asTemporaryFile -rw------- 1 root root 52822 Mar 5 02:30 multipartBody8441473883966750512asTemporaryFile -rw------- 1 root root 52822 Mar 5 02:26 multipartBody8938705844560261947asTemporaryFile

@sb-yb
Copy link
Contributor

sb-yb commented Mar 5, 2021

It may be because its trying to move across volumes.

bash-4.4# df /tmp
Filesystem           1K-blocks      Used Available Use% Mounted on
overlay              104846316  18680440  86165876  18% /
bash-4.4# df /opt/yugabyte/yugaware/data/
keys/                platformReplication/
bash-4.4# df /opt/yugabyte/yugaware/data/platformReplication
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/xvda1           104846316  18680752  86165564  18% /prometheus
bash-4.4#

Looks like i need to use Files.move to make it full-proof.

@sb-yb
Copy link
Contributor

sb-yb commented Mar 5, 2021

This also explains why the issue showed up only when using docker but not on yugabundle install.

sb-yb added a commit that referenced this issue Mar 5, 2021
Summary:
Switch to using Files.move which promises to be system independent.
Earlier we were using File.renameTo which does not work when source and destination
file paths are on two differnt volumes.
This bug shows up when we are running in a container with `/tmp` folder and
`/opt/yugabyte/yugaware/data` on two different volumes.

Test Plan: Unit tested with existing test that backup copying has not broken

Reviewers: daniel

Reviewed By: daniel

Subscribers: yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D10804
@sb-yb sb-yb moved this from To do to In Review in Platform Mar 5, 2021
polarweasel pushed a commit to lizayugabyte/yugabyte-db that referenced this issue Mar 9, 2021
… behavior

Summary:
Switch to using Files.move which promises to be system independent.
Earlier we were using File.renameTo which does not work when source and destination
file paths are on two differnt volumes.
This bug shows up when we are running in a container with `/tmp` folder and
`/opt/yugabyte/yugaware/data` on two different volumes.

Test Plan: Unit tested with existing test that backup copying has not broken

Reviewers: daniel

Reviewed By: daniel

Subscribers: yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D10804
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform Yugabyte Platform priority/high High Priority
Projects
Platform
  
In Review
Development

No branches or pull requests

2 participants