Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize tar stream generation #22

Merged
merged 2 commits into from
Dec 2, 2015
Merged

Conversation

tonistiigi
Copy link
Contributor

After content addressability PR docker has a migration step when starting daemon for the first time. This step calculates the sha256 checksum of all the current data on disk.

This is quite time consuming if you have lots of data so I've tried to optimize it to make it as fast as possible.

This branch has the changes in docker side: moby/moby@master...tonistiigi:migration-opt

All docker side optimization makes migration 55% faster in my testcase. Half of that is the parallel processing on the docker side, other half is the general optimizations mostly in this PR.

The JSON parsing itself is not optimal yet. Especially the part where it creates new buffers for decoding base64. Making this would probably result 5-10% speed increase, but I'm not sure if its worth considering the code would be quite more messy.

This allows to avoid extra allocations on `ReadBytes` and
decoding buffers.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@tonistiigi
Copy link
Contributor Author

Travis fails because sync.Pool is not in go1.2. Do you want to continue supporting it?

Also, can you explain why the unpacker has the duplicate entry check. I would think we would need something like that when making a tar-split file but why do we have to recheck a file already on disk.

@vbatts vbatts changed the title Optimize tar stram generation Optimize tar stream generation Nov 30, 2015
@vbatts vbatts mentioned this pull request Dec 1, 2015
@vbatts
Copy link
Owner

vbatts commented Dec 1, 2015

adding a simple benchmark for getter putter, and the json looks nicer

benchmark             old ns/op     new ns/op     delta
BenchmarkGetPut-4     1643964       1635639       -0.51%

benchmark             old allocs     new allocs     delta
BenchmarkGetPut-4     33             31             -6.06%

benchmark             old bytes     new bytes     delta
BenchmarkGetPut-4     5685          2107          -62.94%

},
}

func copyWithBuffer(dst io.Writer, src io.Reader, buf []byte) (written int64, err error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be trivial, but would you add some tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this func specifically? This is copied from https://github.com/golang/go/blob/master/src/io/io.go#L366

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah. fair.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps a comment citing such

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

- New writeTo method allows to avoid creating extra pipe.
- Copy with a pooled buffer instead of allocating new buffer for each file.
- Avoid extra object allocations inside the loop.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@vbatts
Copy link
Owner

vbatts commented Dec 2, 2015

with a basic benchmark on tar/asm there is an improvement in the number of allocations, but overall speed is insignificant.

benchmark          old ns/op     new ns/op     delta
BenchmarkAsm-4     238968475     242592241     +1.52%

benchmark          old allocs     new allocs     delta
BenchmarkAsm-4     2449           2150           -12.21%

benchmark          old bytes     new bytes     delta
BenchmarkAsm-4     66841059      66558292      -0.42%

@tonistiigi
Copy link
Contributor Author

@vbatts that last benchmark doesn't really reflect real-world usage where you may have thousands of file accesses per one tar-data file. Also docker never calls NewOutputTarStream now and these optimizations are only for the unpack side. The pack side is probably slower in bench but not used in migration.

edit:

  • For the pipe optimization you need to use the library like docker does. Also all cores must be busy, then you start getting significant wait times is blocking profile.
  • The other 2 optimizations are specifically to avoid allocations per file get call but only per tar-data file.

@vbatts
Copy link
Owner

vbatts commented Dec 2, 2015

I see. Well overall this LGTM. I wish it were a bit more comparable for benchmarking, but so it goes.

vbatts added a commit that referenced this pull request Dec 2, 2015
Optimize tar stream generation
@vbatts vbatts merged commit 1501fe6 into vbatts:master Dec 2, 2015
@vbatts
Copy link
Owner

vbatts commented Dec 2, 2015

@tonistiigi I've tagged release v0.9.11 for this.

@vbatts
Copy link
Owner

vbatts commented Dec 2, 2015

understood. feel free to offer up more appropriate benchmarks.

On Wed, Dec 2, 2015 at 2:54 PM, Tõnis Tiigi notifications@github.com
wrote:

@vbatts https://github.com/vbatts that last benchmark doesn't really
reflect real-world usage where you may have thousands of file accesses per
one tar-data file. Also docker never calls NewOutputTarStream now and
these optimizations are only for the unpack side. The pack side is probably
slower in bench but not used in migration.


Reply to this email directly or view it on GitHub
#22 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants