Optimize tar stream generation #22

tonistiigi · 2015-11-30T18:35:07Z

After content addressability PR docker has a migration step when starting daemon for the first time. This step calculates the sha256 checksum of all the current data on disk.

This is quite time consuming if you have lots of data so I've tried to optimize it to make it as fast as possible.

This branch has the changes in docker side: moby/moby@master...tonistiigi:migration-opt

All docker side optimization makes migration 55% faster in my testcase. Half of that is the parallel processing on the docker side, other half is the general optimizations mostly in this PR.

The JSON parsing itself is not optimal yet. Especially the part where it creates new buffers for decoding base64. Making this would probably result 5-10% speed increase, but I'm not sure if its worth considering the code would be quite more messy.

This allows to avoid extra allocations on `ReadBytes` and decoding buffers. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

tonistiigi · 2015-11-30T19:18:44Z

Travis fails because sync.Pool is not in go1.2. Do you want to continue supporting it?

Also, can you explain why the unpacker has the duplicate entry check. I would think we would need something like that when making a tar-split file but why do we have to recheck a file already on disk.

vbatts · 2015-12-01T20:11:42Z

adding a simple benchmark for getter putter, and the json looks nicer

benchmark             old ns/op     new ns/op     delta
BenchmarkGetPut-4     1643964       1635639       -0.51%

benchmark             old allocs     new allocs     delta
BenchmarkGetPut-4     33             31             -6.06%

benchmark             old bytes     new bytes     delta
BenchmarkGetPut-4     5685          2107          -62.94%

vbatts · 2015-12-01T20:34:57Z

tar/asm/assemble.go

+	},
+}
+
+func copyWithBuffer(dst io.Writer, src io.Reader, buf []byte) (written int64, err error) {


may be trivial, but would you add some tests?

To this func specifically? This is copied from https://github.com/golang/go/blob/master/src/io/io.go#L366

perhaps a comment citing such

- New writeTo method allows to avoid creating extra pipe. - Copy with a pooled buffer instead of allocating new buffer for each file. - Avoid extra object allocations inside the loop. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

vbatts · 2015-12-02T19:40:02Z

with a basic benchmark on tar/asm there is an improvement in the number of allocations, but overall speed is insignificant.

benchmark          old ns/op     new ns/op     delta
BenchmarkAsm-4     238968475     242592241     +1.52%

benchmark          old allocs     new allocs     delta
BenchmarkAsm-4     2449           2150           -12.21%

benchmark          old bytes     new bytes     delta
BenchmarkAsm-4     66841059      66558292      -0.42%

tonistiigi · 2015-12-02T19:54:38Z

@vbatts that last benchmark doesn't really reflect real-world usage where you may have thousands of file accesses per one tar-data file. Also docker never calls NewOutputTarStream now and these optimizations are only for the unpack side. The pack side is probably slower in bench but not used in migration.

edit:

For the pipe optimization you need to use the library like docker does. Also all cores must be busy, then you start getting significant wait times is blocking profile.
The other 2 optimizations are specifically to avoid allocations per file get call but only per tar-data file.

vbatts · 2015-12-02T20:08:59Z

I see. Well overall this LGTM. I wish it were a bit more comparable for benchmarking, but so it goes.

Optimize tar stream generation

vbatts · 2015-12-02T20:12:28Z

@tonistiigi I've tagged release v0.9.11 for this.

vbatts · 2015-12-02T21:11:34Z

understood. feel free to offer up more appropriate benchmarks.

On Wed, Dec 2, 2015 at 2:54 PM, Tõnis Tiigi notifications@github.com
wrote:

@vbatts https://github.com/vbatts that last benchmark doesn't really
reflect real-world usage where you may have thousands of file accesses per
one tar-data file. Also docker never calls NewOutputTarStream now and
these optimizations are only for the unpack side. The pack side is probably
slower in bench but not used in migration.

—
Reply to this email directly or view it on GitHub
#22 (comment).

Optimize JSON decoding

8b20f91

This allows to avoid extra allocations on `ReadBytes` and decoding buffers. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

tonistiigi force-pushed the stream-opt branch from 4121bf2 to 8086bff Compare November 30, 2015 18:54

vbatts changed the title ~~Optimize tar stram generation~~ Optimize tar stream generation Nov 30, 2015

vbatts mentioned this pull request Dec 1, 2015

needs benchmarking baked in #23

Closed

vbatts reviewed Dec 1, 2015
View reviewed changes

Optimize tar stream generation

23b6435

- New writeTo method allows to avoid creating extra pipe. - Copy with a pooled buffer instead of allocating new buffer for each file. - Avoid extra object allocations inside the loop. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

tonistiigi force-pushed the stream-opt branch from 8086bff to 23b6435 Compare December 1, 2015 22:09

tonistiigi mentioned this pull request Dec 2, 2015

Content addressability migration takes way too long to be serial moby/moby#18370

Closed

vbatts added a commit that referenced this pull request Dec 2, 2015

Merge pull request #22 from tonistiigi/stream-opt

1501fe6

Optimize tar stream generation

vbatts merged commit 1501fe6 into vbatts:master Dec 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize tar stream generation #22

Optimize tar stream generation #22

tonistiigi commented Nov 30, 2015

tonistiigi commented Nov 30, 2015

vbatts commented Dec 1, 2015

vbatts Dec 1, 2015

tonistiigi Dec 1, 2015

vbatts Dec 1, 2015

vbatts Dec 1, 2015

tonistiigi Dec 1, 2015

vbatts commented Dec 2, 2015

tonistiigi commented Dec 2, 2015

vbatts commented Dec 2, 2015

vbatts commented Dec 2, 2015

vbatts commented Dec 2, 2015

Optimize tar stream generation #22

Optimize tar stream generation #22

Conversation

tonistiigi commented Nov 30, 2015

tonistiigi commented Nov 30, 2015

vbatts commented Dec 1, 2015

vbatts Dec 1, 2015

Choose a reason for hiding this comment

tonistiigi Dec 1, 2015

Choose a reason for hiding this comment

vbatts Dec 1, 2015

Choose a reason for hiding this comment

vbatts Dec 1, 2015

Choose a reason for hiding this comment

tonistiigi Dec 1, 2015

Choose a reason for hiding this comment

vbatts commented Dec 2, 2015

tonistiigi commented Dec 2, 2015

vbatts commented Dec 2, 2015

vbatts commented Dec 2, 2015

vbatts commented Dec 2, 2015