-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy existing volume data to mounts [full ci] #4835
Copy existing volume data to mounts [full ci] #4835
Conversation
f567508
to
583ee64
Compare
lib/tether/tether.go
Outdated
// temp directory to copy existing data to mounts | ||
bindDir = "/.bind" | ||
// markfile to avoid re-copying existing data to mounts | ||
mountsCopied = "/.mountscopied" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is worth implementing since we're using the same path for shared volumes. In which case container1 and container2 and both mount the same volume (someVol
), and both will race on copying their contents to it.
Rather, lets remove this check and always copy the dirents in the underlying mount point (the bind mount) as long as they are newer or do not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not copy multiple times.
As per docker's implementation, this operation is performed only once when the container is created.
Also it looks like docker copies ALL the data and not the new and non-existent ones.
lib/tether/tether.go
Outdated
MaxDeathRecords = 5 | ||
|
||
// the length of a truncated ID for use as hostname | ||
shortLen = 12 | ||
|
||
// temp directory to copy existing data to mounts | ||
bindDir = "/.bind" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest you use a path in /.tether
(a tmpfs) that's based off the volume label - that prevents any collisions and means we don't accidentally leave file system changes around if the container's killed in the middle of this operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a label for nfs mounts. We could use a combination of hostname and path replacing /
with -
. Or just base64 encode the URL
lib/tether/tether.go
Outdated
// temp directory to copy existing data to mounts | ||
bindDir = "/.bind" | ||
// markfile to avoid re-copying existing data to mounts | ||
mountsCopied = "/.mountscopied" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To protect against the future case of allowing dynamic addition of volumes, or vmfork implementation, this could be again /.tether/label.mountscopied
- I'm assuming from the file location that it's not intended to squash copy on subsequent uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above.
- What do we do for NFS volumes with a URL?
- Do we create a different file per volume?
lib/tether/ops_linux.go
Outdated
return err | ||
} | ||
|
||
parentDir := filepath.Dir(filepath.Clean(source)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source
has already had filepath.Clean()
called on it at this point. I believe that filepath.Dir()
calls a clean on the supplied target. So maybe just eliminate the nested function call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, had it twice.
We still need to call clean once even though filepath.Dir calls clean. We are trying to get the parent directory here. If the source has a trailing slash, filepath.Dir returns the same source instead of the parent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing 😄 yeah I was mainly just looking at the filepath.clean in the function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -252,6 +257,41 @@ func (t *tether) setMounts() error { | |||
return fmt.Errorf("unsupported volume mount type for %s: %s", k, v.Source.Scheme) | |||
} | |||
} | |||
return t.populateVolumes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is getting called after we already mount the nfs/device. I am not sure of the behavior of a bind mount after you have already mounted the target. I believe that you will actually end up with the contents of the mounted device. I have not tried it yet though, so I could be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine because we bind mount the parent directory and not the actual target. Bind mounting the parent directory results in mounting the original dir and not the mounted device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, hence the Dir call we talked about above. makes sense, sorry about that.
13b16a4
to
df2a895
Compare
@@ -165,6 +165,13 @@ func (t *Mocker) MountTarget(ctx context.Context, source url.URL, target string, | |||
return nil | |||
} | |||
|
|||
// CopyExistingContent copies the underlying files shadowed by a mount on a directory | |||
// to the volume mounted on the directory | |||
func (t *Mocker) CopyExistingContent(source string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@caglar10ur Please think about how we can change this test setup when continuing with the tether restructure. It was painful at the time to have to duplicate code but with time to do it correctly there should be a better option.
aa8f4f5
to
686e812
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with the following:
- negative case test
- minor layout change for consistency
- follow up issue in Correctness: vet code paths for correct idempotent behaviour #4774
One question about nested volume behaviours.
@@ -37,6 +37,7 @@ type Operations interface { | |||
Apply(endpoint *NetworkEndpoint) error | |||
MountLabel(ctx context.Context, label, target string) error | |||
MountTarget(ctx context.Context, source url.URL, target string, mountOptions string) error | |||
CopyExistingContent(source string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we do this after all of the volumes have been mounted? Another option would have been to inline this behaviour with the MountTarget logic.
I'm also curious what happens if we have the following volumes:
volumeA #already populated with data
volumeB #empty
the following usage (obviously with correct combination of -v and VOLUME config)
VOLUME /mnt/a/b
and usage:
-v volumeA:/mnt/a
Does the contents of a:/b
get copied into b
or do we get the underlying image files. Can this even be phrased or does it get rejected. What does docker-engine do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we do this after all of the volumes have been mounted? Another option would have been to inline this behaviour with the MountTarget logic.
I chose to follow this approach to avoid duplicating the code in MountLabel and MountTarget functions.
If I understood your second question correctly,
Contents are not copied from VolumeA to VolumeB, they are always from the base image to a volume. One, cannot mount two volumes on the same path.
lib/tether/tether.go
Outdated
return nil | ||
} | ||
// skip if this was done before | ||
if _, err := os.Stat(mountsCopied); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, but somewhat violates our goal of idempotent behaviour - if Reload is called and a new volume has been configured (or the CopyMode has changed) then we will not populate newly added volumes.
If we add more complex Copy behaviours (rsync for example) then this would be more simply phrased as a checking function per Copy type that takes a source and dest and returns whether or not to perform the copy, or just copies if needed.
Please open a follow up issue as a child of #4774 to revisit this in a tech debt, with a reference in #696 as that's where we will likely need the revised behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echo'ing @hickeng here. For NFS volumes where 2 containers mount the same volume, the 2nd container will attempt to copy the contents of the mount point again. I thought we discussed this and were simply going to skip copying anything into the volume if the volume was non-empty. This would make the mountsCopied
state file irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
${rc} ${output}= Run And Return Rc And Output docker %{VCH-PARAMS} run jakedsouza/group-1-06-docker-verify-volume-files:1.0 cat /etc/example/testfile.txt | ||
Should Be Equal As Integers ${rc} 0 | ||
Should Contain ${output} TestFile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC you said that if a volume is explicitly specified with -v name:/path
then this copy behaviour doesn't occur.
Please add a test for this case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong about the copy behaviour. As per the docker docs
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point, that existing data is copied into the new volume upon volume initialization. (Note that this does not apply when mounting a host directory.)
So the only case when data is not copied is when a host directory is mounted. This is something vic does not support.
I added a test case to verify that the data is copied to a named volume. This is the same as dockers current behaviour
lib/config/executor/container_vm.go
Outdated
@@ -79,6 +79,17 @@ type ExitLog struct { | |||
Message string | |||
} | |||
|
|||
// CopyMode type to define whether to copy data from the base image on mount | |||
type CopyMode int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this to the top of the file with the State
enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
// NewVolume creates a Volume | ||
func NewVolume(store *url.URL, ID string, info map[string][]byte, device Disk) (*Volume, error) { | ||
func NewVolume(store *url.URL, ID string, info map[string][]byte, device Disk, copyMode executor.CopyMode) (*Volume, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably doesn't need a backwards compatibility check given CopyNew is the default (by virtual of 0
being the nil value of int
) but worth confirming that the backwards compatibility tests have a volume configuration that will exercise the copy paths in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CopyNever
is the default, CopyNew
has a value 1
Changed the enum to be iota+1 by default as requested by @caglar10ur below
lib/tether/ops_linux.go
Outdated
|
||
log.Debugf("creating directory %s", bindDir) | ||
if err := os.MkdirAll(bindDir, 0644); err != nil { | ||
log.Errorf("error creating directory %s: %+v", bindDir, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to check for EEXISTS
and continue in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MkdirAll If path is already a directory, MkdirAll does nothing and returns nil.
so we dont need to check for EEXISTS
defer trace.End(trace.Begin(fmt.Sprintf("copyExistingContent from %s", source))) | ||
|
||
source = filepath.Clean(source) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before creating directories and bind
mounting the underlying mount point, it may be worth checking if there are even files there. There's no sense descending down the mount/copy path if there's nothing to copy, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to bind anyways mount to check if there are any files in the underlying directory
To avoid doing a bind mount, we could change the code flow to the steps below
- In mountLabel and mountTarget, check if the original folder is empty
- Mount the volume
- If step 1 is false then go ahead with copy
lib/tether/ops_linux.go
Outdated
return err | ||
} | ||
log.Debugf("removing %s", bindDir) | ||
if err := os.Remove(bindDir); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do this in a defer
after 712 (in the err == nil case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/tether/ops_linux.go
Outdated
} | ||
|
||
log.Debugf("unmounting %s", bindDir) | ||
if err := Sys.Syscall.Unmount(bindDir, syscall.MNT_DETACH); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would defer this after Mount
is successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/tether/tether.go
Outdated
return nil | ||
} | ||
// skip if this was done before | ||
if _, err := os.Stat(mountsCopied); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echo'ing @hickeng here. For NFS volumes where 2 containers mount the same volume, the 2nd container will attempt to copy the contents of the mount point again. I thought we discussed this and were simply going to skip copying anything into the volume if the volume was non-empty. This would make the mountsCopied
state file irrelevant.
// copy data from the bindDir to the source | ||
// e.g if source is /foo/bar, copy ./bindDir/bar to /foo/bar | ||
log.Debugf("copying contents from to %s to %s", mountedSource, source) | ||
if err := archive.CopyWithTar(mountedSource, source); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this an explicit copy? It will copy over the dest even if the dest exists, no? We really don't want that. Infact, we shouldn't be copying at all if anything exists in the volume (except for the lost+found
directory).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also added a test case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question before approving. What will happen if tether receives a reload in the middle of copy operation?
lib/config/executor/container_vm.go
Outdated
|
||
const ( | ||
// CopyNever Dont copy data on mount | ||
CopyNever CopyMode = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this iota + 1 so that we can differentiate zero value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -117,7 +118,7 @@ func (v *VolumeStore) VolumeCreate(op trace.Operation, ID string, store *url.URL | |||
return nil, fmt.Errorf("Unexpected scheme (%s) for volume store (%s)", u.Scheme, v.Name) | |||
} | |||
|
|||
vol, err := storage.NewVolume(v.SelfLink, ID, info, NewVolume(u, v.volDirPath(ID))) | |||
vol, err := storage.NewVolume(v.SelfLink, ID, info, NewVolume(u, v.volDirPath(ID)), executor.CopyNew) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ranting, no change necessary. I'm really not a big fan of keep adding parameters to the functions. It would be really great to introduce option types for these NewX calls and pass them (like
Line 53 in 5126aeb
// ServerOptions represents the server options |
@@ -31,6 +31,7 @@ import ( | |||
|
|||
log "github.com/Sirupsen/logrus" | |||
"github.com/d2g/dhcp4" | |||
"github.com/docker/docker/pkg/archive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pfffff, here comes a docker dep into tether :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't like this either. But, this is a fairly concise package. coughs
686e812
to
5d3d1d6
Compare
The |
5d3d1d6
to
b301f29
Compare
Can you try or write a test to make sure? |
lgtm |
b301f29
to
39e9739
Compare
@caglar10ur Tried to repro manually. I see a single error which I opened up as a separate issue. Im able to repro it on master #4881 |
@jakedsouza this looks great 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work!
Issue - Volumes in a docker file can contain files These are hidden when this anonymous volume is mounted Fix- Copy the original files to the mount on container start
1. Loop over all mounts 2. For each copy existing data 3. Drop a markfile to not repeat the flow
0. Moved test from Docker run to Docker volume 1. Added additional tests 2. Moved unmount and rmdir to defer 3. Moved state to the top 4. Change to iota + 1 5. Skip copy if mounted volume is not empty (except for lost+found dir)
39e9739
to
2b59844
Compare
* Copy existing files to a mounted volume Issue - Volumes in a docker file can contain files These are hidden when this anonymous volume is mounted Fix- - Copy the original files to the mount on container start - Dont copy if mounts already have data - Call populatevolumes after completing mount - Skip copy if mounted volume is not empty (except for lost+found dir)
Issue:
A containers base image data is not copied when a volume is mounted, if the containers base image contains data at the mount point.
Fix:
After mounting the volumes, copy the underlying base image files to the mount point.
Changes -
populateVolumes
bind mount
the underlying parent directory to a tmp directory/.bind
.mountscopied
in the root so that we don't perform the operation againCopyMode
field in MountSpec. Which defines whether to copy the data or not.Currently it always copies once.
Fixes #3482