New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission update on docker entrypoint takes a long time #3194

Closed
miguelpeixe opened this Issue May 21, 2017 · 61 comments

Comments

Projects
None yet
@miguelpeixe

miguelpeixe commented May 21, 2017

With the current approach on docker entrypoint for updating the files to the new custom UID/GID it takes forever to finish the process, which timeouts in a reasonable production container health check.

Why not just use chown -rf mastodon:mastodon /mastodon/public/system instead of finding and filtering non-mastodon files?


  • I searched or browsed the repo’s other issues to ensure this is not a duplicate.
  • This bug happens on a tagged release and not on master (If you're a user, don't worry about this).
@Gargron

This comment has been minimized.

Member

Gargron commented May 21, 2017

@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 21, 2017

The goal is not to chown /mastodon/public/system. It would take a long time to do that (believe me, I tried every combination possible). So find won't even go there with -path path -prune -o (-not -path path will exclude it but it will go there, so it will take time), it doesn't take time since it updates all permissions but not /public/system which likely contains a lot of data.

So if I understand well, this commandes takes a long time for you?
Can you run something like time docker run -ti --rm mastodon true?
Can you send me your docker info?

@fmauNeko

This comment has been minimized.

Contributor

fmauNeko commented May 21, 2017

I'm having the same issue, here's some info:
docker info

Containers: 53
 Running: 53
 Paused: 0
 Stopped: 0
Images: 130
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 477
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.8.0-46-generic
Operating System: Ubuntu 17.04
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.66GiB
Name: concorde.dissidence.ovh
ID: UEKG:ZQYV:I6EF:VIMY:TV5W:HXRD:SEDE:GPHJ:LKUV:OFGB:NQDY:VVNM
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: fmauneko
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

time docker run --rm -it gargron/mastodon:v1.4rc2 true

Creating mastodon user (UID : 991 and GID : 991)...
Updating permissions...
Executing process...
docker run --rm -it gargron/mastodon:v1.4rc2 true  0.01s user 0.01s system 0% cpu 1:29.87 total
@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 21, 2017

No problem with :

# docker info
Containers: 34
 Running: 34
 Paused: 0
 Stopped: 0
Images: 51
Server Version: 17.05.0-ce
Storage Driver: btrfs
 Build Version: Btrfs v4.7.3
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.11.1
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.16GiB
Name: drogon
ID: TGPJ:KNGK:XV7N:LHP3:BXZG:AHLJ:QOR3:AL6F:ZHME:LHFZ:YBRP:MT4M
Docker Root Dir: /docker
Debug Mode (client): false
Debug Mode (server): false
Username: wonderfall
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true
# time docker run -ti --rm mastodon true
Creating mastodon user (UID : 991 and GID : 991)...
Updating permissions...
Executing process...
docker run -ti --rm mastodon true  0.01s user 0.01s system 0% cpu 5.303 total
@fmauNeko

This comment has been minimized.

Contributor

fmauNeko commented May 21, 2017

Only thing I see is the storage driver, and indeed on my desktop computer which has btrfs:
docker info

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 15
Server Version: 17.05.0-ce
Storage Driver: btrfs
 Build Version: Btrfs v4.10.2
 Library Version: 102
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.10.13-1-ARCH
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.55GiB
Name: izanami
ID: LY2X:EOSA:YURV:H2OK:MVIU:HWPB:DXCX:GTUJ:DDKJ:2CUC:IZZT:RNFA
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

time docker run --rm -it gargron/mastodon:v1.4rc2 true

Creating mastodon user (UID : 991 and GID : 991)...
Updating permissions...
Executing process...
docker run --rm -it gargron/mastodon:v1.4rc2 true  0,02s user 0,02s system 0% cpu 12,941 total
@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 21, 2017

I tried on my Mac, which is using aufs, it takes around 5 seconds. Perhaps it's because of the SSD. But aufs is clearly less performant than btrfs.

# time find /mastodon -path /mastodon/public/system -prune -o -not -user mastodon -not -group mastodon -print0 | xargs -0 chown -f mastodon:mastodon
real	0m 4.26s
user	0m 0.10s
sys	0m 0.45s

# I ran another container to try a "non-optimised" command
# time chown -R 991:991 *
real	0m 5.45s
user	0m 0.18s
sys	0m 1.55s
@fmauNeko

This comment has been minimized.

Contributor

fmauNeko commented May 21, 2017

Yeah so I guess we can accurately say that this is not an issue with Mastodon, but that it's linked to the Docker storage driver choice.

@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 21, 2017

I agree with @fmauNeko, unfortunately we can't do more here.
If you're curious, this article also explains the reasons of why we should use an entrypoint rather than hardcode something in the Dockerfile : https://denibertovic.com/posts/handling-permissions-with-docker-volumes/ (just found it but it seems we had exactly the same idea! that's also what I'm doing for all my images you can find at wonderfall/dockerfiles)

cc @xataz if you have an idea.

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 21, 2017

You are right, took 9m20s to update permissions on my cloud server (overlay2 storage driver) and 7s on my local machine (aufs storage driver). I thought find command would cost more, but its not the case. Looks like a storage driver issue. I'll look into improving my cloud docker setup.

@katarpilar

This comment has been minimized.

katarpilar commented May 21, 2017

I tried to change the storage driver from overlay2 to aufs on my debian jessie VC1S scaleway instance, but the docker daemon fail to start, my kernel dosen't support aufs.
I will build my image without the chown command in the entrypoint script.
But i think this is better to tell admins to do the command before pulling new images in a release notes than to force all admins with overlayfs to build they own images if they dont want to wait 30 min (yes with 3 conteiners it takes time) to start their containers.

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 21, 2017

It's weird to have such bad performance on overlay2, which should be better than aufs. I'm also running on scaleway (VC1M).

@katarpilar

This comment has been minimized.

katarpilar commented May 21, 2017

Ok i can overwrite the entrypoint script : https://docs.docker.com/compose/compose-file/#entrypoint
Maybe we should add this in the documentation?

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 21, 2017

@katarpilar I believe would be best to figure out why we have such bad performance on our cloud setup and add the solution to Troubleshoot docs. I assume a lot of admins use scaleway services.

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 21, 2017

FYI I just got 13s run on another aufs non-ssd cloud service.

@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 21, 2017

That's the reason why I gave up overlay2. The performances are terrible, not with this command in particular, but in general.

That being said aufs is still the standard storage-driver for Docker so I thought no one would complain (but I was wrong :sad: ). Note that overlayfs shouldn't be really recommended for production environments, despite it has been seen as a potential successor to aufs :

As promising as OverlayFS is, it is still relatively young. Therefore caution should be taken before using it in production Docker environments.

Source : Docker documentation

Speaking of OverlayFS, I noticed this performance issue a while ago. I came up with a hack which consists of refreshing (somehow) the files in the layers (so basically this could fix your issue), but it stopped working with a Docker update.

@fmauNeko

This comment has been minimized.

Contributor

fmauNeko commented May 21, 2017

Well I have bad performance with aufs myself, so that's strange.

@oxynux

This comment has been minimized.

oxynux commented May 22, 2017

Same issue (overlay2) :

# docker info
Containers: 14
 Running: 14
 Paused: 0
 Stopped: 0
Images: 61
Server Version: 17.05.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Kernel Version: 4.9.0-0.bpo.2-amd64
Operating System: Debian GNU/Linux 8 (jessie)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.26GiB
Name: tortank
ID: JHJB:GWQI:ZGEE:WEZ4:RLH5:RVRR:UWPA:ZFAH:5MK4:LBAM:4DRA:ZZ36
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: oxynux
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: true
@malicioustoker

This comment has been minimized.

malicioustoker commented May 22, 2017

I am also having HORRIBLE load times with permissions running locally on macOS. 9+ minutes

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 22, 2017

@malicioustoker is it overlay2 storage driver? you can get the information with docker info command.

@malicioustoker

This comment has been minimized.

malicioustoker commented May 22, 2017

@malicioustoker

This comment has been minimized.

malicioustoker commented May 22, 2017

How can I change it from Overlay2 to something else? That's just the default option when Docker is installed on macOS

@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 22, 2017

@malicioustoker Interesting, I still have aufs on my macOS machine (that said I don't have the latest version yet).

But you can change your storage-driver easily :

screen shot 2017-05-22 at 17 50 07

I really think we should open an issue at Docker (moby/moby) rather than arguing about this change, because what else can we do? overlay2 shouldn't have performance issues, while btrfs is as fast as it would be on a classic filesystem.

I do understand your frustration if it's taking too long (10 minutes ??? Come on!), and I suffered from this bug during months before finally giving up overlay2. The thing is overlay2 will overtake aufs in the future (btrfs, devicemapper which shouldn't be used and zfs remain as alternative options), so I'm concerned too.

@malicioustoker

This comment has been minimized.

malicioustoker commented May 22, 2017

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 22, 2017

For Scaleway users, this is how you change to aufs:

  1. Make sure your server bootscript is set to docker. If not, set and reboot.
  2. Test if aufs driver is working by typing sudo modprobe aufs. If it exits empty, means its there.
  3. Edit /etc/docker/daemon.json (create if missing) and add the following:
{
  "storage-driver": "aufs"
}

Restart docker service and that's it.

@malicioustoker

This comment has been minimized.

malicioustoker commented May 22, 2017

Changing to aufs fixed the issue - it now takes no more than 5 seconds to change permissions - thanks everyone!

@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 24, 2017

Can someone try the --squash option? Someone still using overlay2 I mean.

docker build --squash -t mastodon .

For this to work you'll have to enable experimental features in Docker, put this in /etc/docker/daemon.json :

{
    "experimental": true
}
@malicioustoker

This comment has been minimized.

malicioustoker commented May 24, 2017

@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 24, 2017

OverlayFS implements union mount, it's supposed to be faster, and it's in Linux kernel upstream. It will overtake aufs for these reasons, once it's mature (and it already happened in RHEL/CentOS).

That said there are other alternatives :

  • Btrfs (the one I'm using, no problem)
  • ZFS
  • Devicemapper
@fmauNeko

This comment has been minimized.

Contributor

fmauNeko commented May 24, 2017

overlay2 already improves a lot from overlay, but still, OverlayFS is a bit immature, that's why stable Docker still uses aufs as a default (Edge Docker use overlay2 now AFAIK).
It's only advantage now is that it's already in the upstream Kernel source.

If you need to setup a new server for Docker though, use btrfs or zfs, as they are natively Copy-on-write filesystems.

@xsteadfastx

This comment has been minimized.

xsteadfastx commented May 29, 2017

im on overlay2 too and it takes over 30 minutes for me. im all morning for upgrading because the recreate and migrate commands start chown in the entrypoint.

@Wonderfall

This comment has been minimized.

Contributor

Wonderfall commented May 30, 2017

@xsteadfastx No you're not stuck. You can still use btrfs :

  • Make a new volume formatted with Btrfs.
  • Mount the volume somewhere else like /docker.
  • Edit /etc/docker/daemon.json :
    • Change /var/lib/docker to /docker
    • Change the storage-driver (or add it) value to "btrfs"
  • Restart Docker.
@xsteadfastx

This comment has been minimized.

xsteadfastx commented May 30, 2017

@fmauNeko yep... installed...

Containers: 21
 Running: 7
 Paused: 0
 Stopped: 14
Images: 165
Server Version: 17.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-78-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.736 GiB
Name: rorschach
ID: 46XI:SRHG:O332:EDNX:4Q6P:ZOEW:XRA2:EGIU:2ANA:I2AZ:JOJ7:NLUD
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 30, 2017

@xsteadfastx try installing linux-image-extra-virtual than sudo modprobe aufs

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 30, 2017

@xsteadfastx also, don't forget to add to /etc/docker/daemon.json

{
  "storage-driver": "aufs"
}
@xsteadfastx

This comment has been minimized.

xsteadfastx commented May 30, 2017

@miguelpeixe no kernel support for aufs on ubuntu 16.04
@Wonderfall i dont have a spare partition for btrfs... too bad... else i would test it right away

@miguelpeixe

This comment has been minimized.

miguelpeixe commented May 30, 2017

Hum, that's weird, I'm using aufs on all my 16.04 servers, including my local setup.

@gc373

This comment has been minimized.

gc373 commented May 30, 2017

@xsteadfastx
now, I'm updating. overlay2 -> aufs is ok. (sorry, Ubuntu 16.10)

$ sudo apt-get update
$ sudo apt-get install     linux-image-extra-$(uname -r)     linux-image-extra-virtual
$ sudo modprobe aufs
$ cat /proc/filesystems | grep aufs

$ nano /etc/docker/daemon.json

{
  "storage-driver": "aufs"
}

$ service docker restart
$ docker info

@xsteadfastx

This comment has been minimized.

xsteadfastx commented May 30, 2017

ok i have to say sorry... modprobe aufs after installing linux-image-extra-virtual did the trick... sorry for this discussion about the docker image... it works pretty well i was just in a bad mood because it tooks hours to upgrade mastodon.

thanks for all the help.

@George3d6

This comment has been minimized.

George3d6 commented Oct 15, 2017

There seems to be no "fix" for this issue yet and some us don't have aufs compatible kernels or the luxury of creating a VM or attaching an extra partition with brtfs or zfs for the sake of prototyping.

So my question here would be, how harmful exactly is just doing:

chown mastodon:mastodon /mastodon/public/system

instead of: find /mastodon -path /mastodon/public/system -prune -o -not -user mastodon -not -group mastodon -print0 | xargs -0 chown -f mastodon:mastodon

Would it suffice to just add a check to the script and when overlay or overlay2 is detected run a warning and just chown the entire system directory ? It seems like a good compromise considering the speed "bug" is with docker (or overlay2... depending on how you think about it) and may not be fixed in a while, however docker is slowly migrating to overlay and linunx distros are slowly removing aufs support out of the default shipped kernels.

My current fix is to run the mastodon instance with the original chown (as to not risk any issues) and "manually" modify the script inside the image to the chown of the directory for running any other tasks (e.g. creating admin users) much faster. But that is hardly the most convenient think to do, since it require an file edit every time I want to reboot my instance.

@fmauNeko

This comment has been minimized.

Contributor

fmauNeko commented Oct 15, 2017

Well it won't be recursive, and I'm not sure chown -R would be faster than the current solution.

@shuaiscott

This comment has been minimized.

shuaiscott commented Oct 21, 2017

Just tried it on my 2 CPU, 8 GB RAM server and it look 49 mins... 😦

@pierreozoux

This comment has been minimized.

Contributor

pierreozoux commented Feb 19, 2018

What do you think about
#6510

For me it would be a nice workaround.

@malicioustoker

This comment has been minimized.

malicioustoker commented Feb 20, 2018

@Gargron

This comment has been minimized.

Member

Gargron commented Feb 20, 2018

@moritzheiber

This comment has been minimized.

Member

moritzheiber commented Feb 20, 2018

The solution here would be to use the user/group names instead of the variables.

I'll come up with a PR.

@moritzheiber

This comment has been minimized.

Member

moritzheiber commented Feb 20, 2018

@malicioustoker This should be fixed now.

@malicioustoker

This comment has been minimized.

malicioustoker commented Feb 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment