vendor: adding tar-split dependency for graph

tar-split is a facility to disassemble and reassemble tar archives Signed-off-by: Vincent Batts <vbatts@redhat.com>
vbatts · Jul 21, 2015 · 5ddec2a · 5ddec2a
1 parent 1ca7378
commit 5ddec2a
Show file tree

Hide file tree

Showing 45 changed files with 4,983 additions and 1 deletion.
diff --git a/hack/vendor.sh b/hack/vendor.sh
@@ -33,8 +33,9 @@ clone git github.com/samuel/go-zookeeper d0e0d8e11f318e000a8cc434616d69e329edc37
 clone git github.com/coreos/go-etcd v2.0.0
 clone git github.com/hashicorp/consul v0.5.2
 
-# get distribution packages
+# get graph and distribution packages
 clone git github.com/docker/distribution 419bbc2da637d9b2a812be78ef8436df7caac70d
+clone git github.com/vbatts/tar-split v0.9.3
 
 clone git github.com/opencontainers/runc v0.0.2 # libcontainer
 # libcontainer deps (see src/github.com/docker/libcontainer/update-vendor.sh)

diff --git a/vendor/src/github.com/vbatts/tar-split/.travis.yml b/vendor/src/github.com/vbatts/tar-split/.travis.yml
@@ -0,0 +1,13 @@
+language: go
+go:
+  - 1.4.2
+  - 1.3.3
+
+# let us have pretty, fast Docker-based Travis workers!
+sudo: false
+
+# we don't need "go get" here <3
+install: go get -d ./...
+
+script:
+  - go test -v ./...
diff --git a/vendor/src/github.com/vbatts/tar-split/DESIGN.md b/vendor/src/github.com/vbatts/tar-split/DESIGN.md
@@ -0,0 +1,36 @@
+Flow of TAR stream
+==================
+
+The underlying use of `github.com/vbatts/tar-split/archive/tar` is most similar
+to stdlib.
+
+
+Packer interface
+----------------
+
+For ease of storage and usage of the raw bytes, there will be a storage
+interface, that accepts an io.Writer (This way you could pass it an in memory
+buffer or a file handle).
+
+Having a Packer interface can allow configuration of hash.Hash for file payloads
+and providing your own io.Writer.
+
+Instead of having a state directory to store all the header information for all
+Readers, we will leave that up to user of Reader. Because we can not assume an
+ID for each Reader, and keeping that information differentiated.
+
+
+
+State Directory
+---------------
+
+Perhaps we could deduplicate the header info, by hashing the rawbytes and
+storing them in a directory tree like:
+
+	./ac/dc/beef
+
+Then reference the hash of the header info, in the positional records for the
+tar stream. Though this could be a future feature, and not required for an
+initial implementation. Also, this would imply an owned state directory, rather
+than just writing storage info to an io.Writer.
+
diff --git a/vendor/src/github.com/vbatts/tar-split/LICENSE b/vendor/src/github.com/vbatts/tar-split/LICENSE
@@ -0,0 +1,19 @@
+Copyright (c) 2015 Vincent Batts, Raleigh, NC, USA
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff --git a/vendor/src/github.com/vbatts/tar-split/README.md b/vendor/src/github.com/vbatts/tar-split/README.md
@@ -0,0 +1,181 @@
+tar-split
+========
+
+[![Build Status](https://travis-ci.org/vbatts/tar-split.svg?branch=master)](https://travis-ci.org/vbatts/tar-split)
+
+Extend the upstream golang stdlib `archive/tar` library, to expose the raw
+bytes of the TAR, rather than just the marshalled headers and file stream.
+
+The goal being that by preserving the raw bytes of each header, padding bytes,
+and the raw file payload, one could reassemble the original archive.
+
+
+Docs
+----
+
+* https://godoc.org/github.com/vbatts/tar-split/tar/asm
+* https://godoc.org/github.com/vbatts/tar-split/tar/storage
+* https://godoc.org/github.com/vbatts/tar-split/archive/tar
+
+
+Caveat
+------
+
+Eventually this should detect TARs that this is not possible with.
+
+For example stored sparse files that have "holes" in them, will be read as a
+contiguous file, though the archive contents may be recorded in sparse format.
+Therefore when adding the file payload to a reassembled tar, to achieve
+identical output, the file payload would need be precisely re-sparsified. This
+is not something I seek to fix imediately, but would rather have an alert that
+precise reassembly is not possible.
+(see more http://www.gnu.org/software/tar/manual/html_node/Sparse-Formats.html)
+
+
+Other caveat, while tar archives support having multiple file entries for the
+same path, we will not support this feature. If there are more than one entries
+with the same path, expect an err (like `ErrDuplicatePath`) or a resulting tar
+stream that does not validate your original checksum/signature.
+
+
+Contract
+--------
+
+Do not break the API of stdlib `archive/tar` in our fork (ideally find an
+upstream mergeable solution)
+
+
+Std Version
+-----------
+
+The version of golang stdlib `archive/tar` is from go1.4.1, and their master branch around [a9dddb53f](https://github.com/golang/go/tree/a9dddb53f)
+
+
+Example
+-------
+
+First we'll get an archive to work with. For repeatability, we'll make an
+archive from what you've just cloned:
+
+```
+git archive --format=tar -o tar-split.tar HEAD .
+```
+
+Then build the example main.go:
+
+```
+go build ./main.go
+```
+
+Now run the example over the archive:
+
+```
+$ ./main tar-split.tar
+2015/02/20 15:00:58 writing "tar-split.tar" to "tar-split.tar.out"
+pax_global_header pre: 512 read: 52
+.travis.yml pre: 972 read: 374
+DESIGN.md pre: 650 read: 1131
+LICENSE pre: 917 read: 1075
+README.md pre: 973 read: 4289
+archive/ pre: 831 read: 0
+archive/tar/ pre: 512 read: 0
+archive/tar/common.go pre: 512 read: 7790
+[...]
+tar/storage/entry_test.go pre: 667 read: 1137
+tar/storage/getter.go pre: 911 read: 2741
+tar/storage/getter_test.go pre: 843 read: 1491
+tar/storage/packer.go pre: 557 read: 3141
+tar/storage/packer_test.go pre: 955 read: 3096
+EOF padding: 1512
+Remainder: 512
+Size: 215040; Sum: 215040
+```
+
+*What are we seeing here?* 
+
+* `pre` is the header of a file entry, and potentially the padding from the
+  end of the prior file's payload. Also with particular tar extensions and pax
+  attributes, the header can exceed 512 bytes.
+* `read` is the size of the file payload from the entry
+* `EOF padding` is the expected 1024 null bytes on the end of a tar archive,
+  plus potential padding from the end of the prior file entry's payload
+* `Remainder` is the remaining bytes of an archive. This is typically deadspace
+  as most tar implmentations will return after having reached the end of the
+  1024 null bytes. Though various implementations will include some amount of
+  bytes here, which will affect the checksum of the resulting tar archive,
+  therefore this must be accounted for as well.
+
+Ideally the input tar and output `*.out`, will match:
+
+```
+$ sha1sum tar-split.tar*
+ca9e19966b892d9ad5960414abac01ef585a1e22  tar-split.tar
+ca9e19966b892d9ad5960414abac01ef585a1e22  tar-split.tar.out
+```
+
+
+Stored Metadata
+---------------
+
+Since the raw bytes of the headers and padding are stored, you may be wondering
+what the size implications are. The headers are at least 512 bytes per
+file (sometimes more), at least 1024 null bytes on the end, and then various
+padding. This makes for a constant linear growth in the stored metadata, with a
+naive storage implementation.
+
+Reusing our prior example's `tar-split.tar`, let's build the checksize.go example:
+
+```
+go build ./checksize.go
+```
+
+```
+$ ./checksize ./tar-split.tar
+inspecting "tar-split.tar" (size 210k)
+ -- number of files: 50
+ -- size of metadata uncompressed: 53k
+ -- size of gzip compressed metadata: 3k
+```
+
+So assuming you've managed the extraction of the archive yourself, for reuse of
+the file payloads from a relative path, then the only additional storage
+implications are as little as 3kb.
+
+But let's look at a larger archive, with many files.
+
+```
+$ ls -sh ./d.tar
+1.4G ./d.tar
+$ ./checksize ~/d.tar 
+inspecting "/home/vbatts/d.tar" (size 1420749k)
+ -- number of files: 38718
+ -- size of metadata uncompressed: 43261k
+ -- size of gzip compressed metadata: 2251k
+```
+
+Here, an archive with 38,718 files has a compressed footprint of about 2mb.
+
+Rolling the null bytes on the end of the archive, we will assume a
+bytes-per-file rate for the storage implications.
+
+| uncompressed | compressed |
+| :----------: | :--------: |
+| ~ 1kb per/file | 0.06kb per/file |
+
+
+What's Next?
+------------
+
+* More implementations of storage Packer and Unpacker
+ - could be a redis or mongo backend
+* More implementations of FileGetter and FilePutter
+ - could be a redis or mongo backend
+* cli tooling to assemble/disassemble a provided tar archive
+* would be interesting to have an assembler stream that implements `io.Seeker`
+
+License
+-------
+
+See LICENSE
+
+