Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart Upload Failure #1119

Closed
nikhil-postgres opened this issue Oct 13, 2021 · 2 comments
Closed

Multipart Upload Failure #1119

nikhil-postgres opened this issue Oct 13, 2021 · 2 comments

Comments

@nikhil-postgres
Copy link

Database name

PostgreSQL

Issue description

Multipart upload failure

Problem Desc:
wal-g version - 0.2.14
PostgreSQL version - 11.7

We are doing daily backups using wal-g for multiple postgres instances to S3. Backups are failing with below errors

Error Log:

ERROR: 2021/10/12 22:00:46.049337 failed to upload 'basebackups_005/base_000000010000064200000029/tar_partitions/part_012.tar.lz4' to bucket '***bucketname***': MultipartUpload: upload multipart failed
					upload id: AQAAAXx4Aq1U03VjzpK1TIWSyvMVORYv4Xm-ZDE
				caused by: RequestError: send request failed
				caused by: Put https://endpoint/***buckentname***/basebackups_005/base_000000010000064200000029/tar_partitions/part_012.tar.lz4?partNumber=1&uploadId=AQAAAXx4Aq1U03VjzpK1TIWSyvMVORYv4Xm-ZDE: read tcp ip -> ip:443: read: connection reset by peer
				ERROR: 2021/10/12 22:00:46.049359 upload: could not upload 'base_000000010000064200000029/tar_partitions/part_012.tar.lz4'
				ERROR: 2021/10/12 22:00:46.049369 failed to upload 'basebackups_005/base_000000010000064200000029/tar_partitions/part_012.tar.lz4' to bucket '***bucketname***': MultipartUpload: upload multipart failed
					upload id: AQAAAXx4Aq1U03VjzpK1TIWSyvMVORYv4Xm-ZDE
				caused by: RequestError: send request failed
				caused by: Put https://endpoint/***buckentname***/basebackups_005/base_000000010000064200000029/tar_partitions/part_012.tar.lz4?partNumber=1&uploadId=AQAAAXx4Aq1U03VjzpK1TIWSyvMVORYv4Xm-ZDE: read tcp ip -> ip:443: read: connection reset by peer
				panic: packFileIntoTar: operation failed: PackFileTo: copy failed: io: read/write on closed pipe
				
				goroutine 6187 [running]:
				github.com/wal-g/wal-g/internal.(*Bundle).handleTar.func1(0xc00015e460, 0xc001dd5980, 0x2d, 0x186cde0, 0xc001d42c30, 0xc002f910a0, 0x0, 0x186c9c0, 0xc003288d40)
					/home/travis/gopath/src/github.com/wal-g/wal-g/internal/bundle.go:375 +0x10c
				created by github.com/wal-g/wal-g/internal.(*Bundle).handleTar
					/home/travis/gopath/src/github.com/wal-g/wal-g/internal/bundle.go:370 +0x5d0

Masked some sensitive data in above logs. We are not sure why we are getting this error.

@nikhil-postgres
Copy link
Author

Hi Team,

We also see wal archival is failing with below errors:

/data/walbkpscript.ksh: line 28: 17577: Killed
2021-10-13 04:34:35 UTC [19178]: [3-1] user=,db=,app=,client= FATAL:  archive command was terminated by signal 9: Killed
2021-10-13 04:34:35 UTC [19178]: [4-1] user=,db=,app=,client= DETAIL:  The failed archive command was: /data/walbkpscript.ksh 5333 pg_wal/0000000400002AB500000017
2021-10-13 04:34:35 UTC [4789]: [11-1] user=,db=,app=,client= LOG:  archiver process (PID 19178) exited with exit code 1
--

@usernamedt
Copy link
Member

usernamedt commented Oct 28, 2021

  		caused by: Put https://endpoint/***buckentname***/basebackups_005/base_000000010000064200000029/tar_partitions/part_012.tar.lz4?partNumber=1&uploadId=AQAAAXx4Aq1U03VjzpK1TIWSyvMVORYv4Xm-ZDE: read tcp ip -> ip:443: read: connection reset by peer

Probably, it is related to the bad network connection. This PR might help to solve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants