Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DecompressLzo: write to pipe failed #22

Closed
dgarbus opened this issue Aug 29, 2017 · 17 comments
Closed

DecompressLzo: write to pipe failed #22

dgarbus opened this issue Aug 29, 2017 · 17 comments

Comments

@dgarbus
Copy link

dgarbus commented Aug 29, 2017

Versions

CentOS 7.3
wal-g v0.1.2
wal-e 1.0.3 (creator of source basebackup)

Problem

Two attempts to backup-fetch a ~1TB basebackup have resulted in wal-g failing with the following stack trace:

base/16417/12983_vm
base/16417/27620292
base/16417/10323582
base/16417/10324516
base/16417/33825612_fsm
2017/08/29 20:07:43 DecompressLzo: write to pipe failed
github.com/wal-g/wal-g.DecompressLzo
        /home/travis/gopath/src/github.com/wal-g/wal-g/decompress.go:126
github.com/wal-g/wal-g.tarHandler
        /home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:66
github.com/wal-g/wal-g.ExtractAll.func2.2
        /home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:138
runtime.goexit
        /home/travis/.gimme/versions/go1.8.3.linux.amd64/src/runtime/asm_amd64.s:2197
ExtractAll: lzo decompress failed
github.com/wal-g/wal-g.tarHandler
        /home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:68
github.com/wal-g/wal-g.ExtractAll.func2.2
        /home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:138
runtime.goexit
        /home/travis/.gimme/versions/go1.8.3.linux.amd64/src/runtime/asm_amd64.s:2197

In both cases, wal-g appeared to be near the end of the restore (over 1TB of data was written to the restore directory) and failed with the same trace. After inspecting the restore and attempting to start postgres, I can confirm that the restore is indeed incomplete.

The basebackup was taken with wal-e 1.0.3, which was also able to restore the same backup without any issues.

@x4m
Copy link
Collaborator

x4m commented Oct 23, 2017

Hi!
Do you observe this on smaller bases?
We need to determine is the size of DB root cause or something else from your environment.

@diranged
Copy link

diranged commented Jan 5, 2018

We just ran into this as well.. we're trying to migrate from WAL-E 0.9.8 to WAL-G 0.1.3:

2018/01/05 03:08:49 DecompressLzo: write to pipe failed
github.com/wal-g/wal-g.DecompressLzo
	/home/travis/gopath/src/github.com/wal-g/wal-g/decompress.go:126
github.com/wal-g/wal-g.tarHandler
	/home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:76
github.com/wal-g/wal-g.ExtractAll.func2.2
	/home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:150
runtime.goexit
	/home/travis/.gimme/versions/go1.8.5.linux.amd64/src/runtime/asm_amd64.s:2197
ExtractAll: lzo decompress failed. Is archive encrypted?
github.com/wal-g/wal-g.tarHandler
	/home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:78
github.com/wal-g/wal-g.ExtractAll.func2.2
	/home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:150
runtime.goexit
	/home/travis/.gimme/versions/go1.8.5.linux.amd64/src/runtime/asm_amd64.s:2197

We don't have any smaller databases.. but this was around 450GB into a 600GB restore.

@x4m
Copy link
Collaborator

x4m commented Jan 5, 2018

@diranged it's known to be a bug in the lzo library, see #35.
Is your data confidential? If no - I have interest in fixing that lib, but I do not have repro yet. But, unfortunately, can neither guaranty nor promise ETA...

@x4m
Copy link
Collaborator

x4m commented Sep 6, 2018

This issue is known to arise due to network timeouts and some S3 implementations server-side throttling. Fixed in the latest release.

@x4m x4m closed this as completed Sep 6, 2018
@damirda
Copy link

damirda commented Oct 9, 2018

I still have the same problem.
CentOS 7.5
wal-e v1.1.0 (creator of source basebackup) - python 3.5
wal-g v0.1.12.lzo

2018/10/09 13:46:16 ExtractAll: lzo decompress failed. Is archive encrypted?: io: read/write on closed pipe                           
2018/10/09 13:46:16 Iteration finished, failed tars: 
2018/10/09 13:46:16 database_continuous/test_walg/10/basebackups_005/base_000000010000000000000003_00000040/tar_partitions/part_00000000.tar.lzo,
2018/10/09 13:46:16 
2018/10/09 13:46:21 io: read/write on closed pipe
ExtractAll: lzo decompress failed. Is archive encrypted?
github.com/wal-g/wal-g.handleTar
        /home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:73                                                                  
github.com/wal-g/wal-g.tryExtractFiles.func1.2
        /home/travis/gopath/src/github.com/wal-g/wal-g/extract.go:161                                                                 
runtime.goexit
        /home/travis/.gimme/versions/go1.10.4.linux.amd64/src/runtime/asm_amd64.s:2361   

This is an empty PostgreSQL, just created, with a single test database, not a single table.
If you need it for debugging, I can upload the backup somewhere.

@x4m
Copy link
Collaborator

x4m commented Oct 9, 2018

@damirda can you plz give me backup or just part_00000000.tar.lzo ?
I'll try to debug lzo

@damirda
Copy link

damirda commented Oct 9, 2018

Yes, I can. As I said, it is completely empty database made just for debugging.
https://damirda.com/backup.tar.xz
It is just 3MB.

@x4m
Copy link
Collaborator

x4m commented Oct 10, 2018

@damirda these lzo archives are padded with zeroes at the end.
I can fix this next Monday, if you want it faster I will happily review a PR
Thanks for reporting, you are doing WAL-G better!

@damirda
Copy link

damirda commented Oct 10, 2018

I would love to, but I don't know Go. :-(
Maybe I should start learning it... :-D

@x4m
Copy link
Collaborator

x4m commented Oct 10, 2018

LzoDecompressor now uses io.Copy(). There is FastCopy in utils.go, with quite a similar purpose, you can copy this to lzo decompressor and modify: if you get error ErrClosedPipe, then you check that there are only zeroes down to the end of the stream. If there are nonzero bytes - you return previous error, if not - you report success.
Or something like this

@jsuchal
Copy link

jsuchal commented Oct 26, 2018

Sorry for bumping this, is this in current release? I am having same problem on the current (pre)release.

@x4m
Copy link
Collaborator

x4m commented Oct 27, 2018

uh.... thanks for bump, will fix this soon... @jsuchal do you have an environment to verify?

@jsuchal
Copy link

jsuchal commented Oct 27, 2018

Sure, ping me anytime and I'll test that.

@x4m
Copy link
Collaborator

x4m commented Oct 27, 2018

@jsuchal can you test version from branch REL0_1_STABLE?

@x4m
Copy link
Collaborator

x4m commented Oct 27, 2018

It works for me for @damirda case

@jsuchal
Copy link

jsuchal commented Oct 27, 2018

@x4m hmm, i can't manage to build it. Could you share a binary?

@x4m
Copy link
Collaborator

x4m commented Oct 27, 2018

I've pushed cut a pre-release v0.1.13, ping me if anything goes wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants