Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docker-worker-chunk-2 #5800

Merged
merged 3 commits into from Nov 29, 2022
Merged

Fix docker-worker-chunk-2 #5800

merged 3 commits into from Nov 29, 2022

Conversation

petemoore
Copy link
Member

Not sure what is causing #5799 at the moment, so will troubleshoot in this PR. Will keep in PR in DRAFT status until I've worked out what is going on and made an appropriate fix.

@petemoore
Copy link
Member Author

The additional logs reveal:

Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 4:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)

@petemoore
Copy link
Member Author

petemoore commented Nov 28, 2022

@lotas
Copy link
Contributor

lotas commented Nov 28, 2022

great find @petemoore . looks like we were missing a deployment documentation for this

@petemoore
Copy link
Member Author

Inside docker container, task is running as root. Perhaps docker-worker doesn't run as root on host. If that is the case, it looks like we can either:

  • grant additional linux capabilities to the user it does run as, or
  • run docker-worker as root, or
  • reduce the capabilities required for collecting performance metrics on the host by updating /proc/sys/kernel/perf_event_paranoid

If so, I suspect the first option is probably the better one (principle of least privilege).

@petemoore
Copy link
Member Author

Additionally: it looks like the disableSeccomp docker-worker capability is a pretty open-door, perhaps seccomp profiles might be better for fine-grained access? In any case, this shouldn't be too much of an issue when moving to generic-worker, as task users are generally non-privileged and ioslated from each other.

@petemoore
Copy link
Member Author

I reproduced on an AWS machine:

ubuntu@ip-172-31-1-54:~$ sudo docker run --security-opt seccomp=unconfined --rm -ti alpine /bin/sh -c 'whoami && apk add perf; perf stat ls'
root
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/9) Installing libgcc (12.2.1_git20220924-r4)
(2/9) Installing libstdc++ (12.2.1_git20220924-r4)
(3/9) Installing binutils (2.39-r2)
(4/9) Installing libcap2 (2.66-r0)
(5/9) Installing libbz2 (1.0.8-r4)
(6/9) Installing musl-fts (1.2.7-r3)
(7/9) Installing xz-libs (5.2.8-r0)
(8/9) Installing libelf (0.187-r2)
(9/9) Installing perf (5.15.74-r0)
Executing busybox-1.35.0-r29.trigger
OK: 28 MiB in 24 packages
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 4:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)

Then from the three options I mentioned earlier:

  • grant additional linux capabilities to the user it does run as, or
  • run docker-worker as root, or
  • reduce the capabilities required for collecting performance metrics on the host by updating /proc/sys/kernel/perf_event_paranoid

I tried the first one:

ubuntu@ip-172-31-1-54:~$ sudo docker run --cap-add CAP_PERFMON --rm -ti alpine /bin/sh -c 'whoami && apk add perf; perf stat ls'
docker: Error response from daemon: invalid CapAdd: unknown capability: "CAP_PERFMON".
See 'docker run --help'.

which led me to docker/docs#13731 which told me that it isn't yet possible to run docker with this additional privilege (afaict).

I can cheat with full privilege:

ubuntu@ip-172-31-1-54:~$ sudo docker run --privileged --rm -ti alpine /bin/sh -c 'whoami && apk add perf; perf stat ls'
root
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/9) Installing libgcc (12.2.1_git20220924-r4)
(2/9) Installing libstdc++ (12.2.1_git20220924-r4)
(3/9) Installing binutils (2.39-r2)
(4/9) Installing libcap2 (2.66-r0)
(5/9) Installing libbz2 (1.0.8-r4)
(6/9) Installing musl-fts (1.2.7-r3)
(7/9) Installing xz-libs (5.2.8-r0)
(8/9) Installing libelf (0.187-r2)
(9/9) Installing perf (5.15.74-r0)
Executing busybox-1.35.0-r29.trigger
OK: 28 MiB in 24 packages
bin    dev    etc    home   lib    media  mnt    opt    proc   root   run    sbin   srv    sys    tmp    usr    var

 Performance counter stats for 'ls':

              0.44 msec task-clock                #    0.613 CPUs utilized          
                 0      context-switches          #    0.000 /sec                   
                 0      cpu-migrations            #    0.000 /sec                   
                37      page-faults               #   83.369 K/sec                  
   <not supported>      cycles                                                      
   <not supported>      instructions                                                
   <not supported>      branches                                                    
   <not supported>      branch-misses                                               

       0.000724529 seconds time elapsed

       0.000770000 seconds user
       0.000000000 seconds sys


Note, here I am not running with --security-opt seccomp=unconfined at all, so --privileged seems to be sufficient for allowing the host syscalls to be made. I will test this theory shortly.

@petemoore
Copy link
Member Author

Note, here I am not running with --security-opt seccomp=unconfined at all, so --privileged seems to be sufficient for allowing the host syscalls to be made. I will test this theory shortly.

So indeed, using the privileged capability instead of the disableSeccomp capability made it possible to collect performance counters. My guess is that @jschwartzentruber added the disableSeccomp capability to have a more confined way of enabling performance counters rather than when using the docker-worker privileged capability.

@jschwartzentruber can you confirm?

If so, I think this (currently) leaves us with the only option being to modify /proc/sys/kernel/perf_event_paranoid on the workers, until CAP_PERFMON is a supported docker capability (i.e. until docker run --cap-add CAP_PERFMON works in a future release of docker). At that time, we can pass in that capability when launching the task container as part of the docker-worker feature and restore /proc/sys/kernel/perf_event_paranoid to its original value.

@petemoore
Copy link
Member Author

petemoore commented Nov 29, 2022

Note, setting to 3 seems to be sufficient...

Not working:

ubuntu@ip-172-31-1-54:~$ sudo docker run --security-opt seccomp=unconfined --rm -ti alpine /bin/sh -c 'whoami && apk add perf; perf stat ls'
root
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/9) Installing libgcc (12.2.1_git20220924-r4)
(2/9) Installing libstdc++ (12.2.1_git20220924-r4)
(3/9) Installing binutils (2.39-r2)
(4/9) Installing libcap2 (2.66-r0)
(5/9) Installing libbz2 (1.0.8-r4)
(6/9) Installing musl-fts (1.2.7-r3)
(7/9) Installing xz-libs (5.2.8-r0)
(8/9) Installing libelf (0.187-r2)
(9/9) Installing perf (5.15.74-r0)
Executing busybox-1.35.0-r29.trigger
OK: 28 MiB in 24 packages
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 4:
  -1: Allow use of (almost) all events by all users
      Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)

Check current value:

ubuntu@ip-172-31-1-54:~$ cat /proc/sys/kernel/perf_event_paranoid 
4

Reduce to 3:

ubuntu@ip-172-31-1-54:~$ echo 3 | sudo tee /proc/sys/kernel/perf_event_paranoid 
3

Check change was effective:

ubuntu@ip-172-31-1-54:~$ cat /proc/sys/kernel/perf_event_paranoid 
3

Try again:

ubuntu@ip-172-31-1-54:~$ sudo docker run --security-opt seccomp=unconfined --rm -ti alpine /bin/sh -c 'whoami && apk add perf; perf stat ls'
root
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/9) Installing libgcc (12.2.1_git20220924-r4)
(2/9) Installing libstdc++ (12.2.1_git20220924-r4)
(3/9) Installing binutils (2.39-r2)
(4/9) Installing libcap2 (2.66-r0)
(5/9) Installing libbz2 (1.0.8-r4)
(6/9) Installing musl-fts (1.2.7-r3)
(7/9) Installing xz-libs (5.2.8-r0)
(8/9) Installing libelf (0.187-r2)
(9/9) Installing perf (5.15.74-r0)
Executing busybox-1.35.0-r29.trigger
OK: 28 MiB in 24 packages
bin    dev    etc    home   lib    media  mnt    opt    proc   root   run    sbin   srv    sys    tmp    usr    var

 Performance counter stats for 'ls':

              0.56 msec task-clock:u              #    0.559 CPUs utilized          
                 0      context-switches:u        #    0.000 /sec                   
                 0      cpu-migrations:u          #    0.000 /sec                   
                35      page-faults:u             #   63.018 K/sec                  
   <not supported>      cycles:u                                                    
   <not supported>      instructions:u                                              
   <not supported>      branches:u                                                  
   <not supported>      branch-misses:u                                             

       0.000993046 seconds time elapsed

       0.001039000 seconds user
       0.000000000 seconds sys


ubuntu@ip-172-31-1-54:~$ 

Worked!

@jschwartzentruber
Copy link
Contributor

jschwartzentruber commented Nov 29, 2022

Note, here I am not running with --security-opt seccomp=unconfined at all, so --privileged seems to be sufficient for allowing the host syscalls to be made. I will test this theory shortly.

So indeed, using the privileged capability instead of the disableSeccomp capability made it possible to collect performance counters. My guess is that @jschwartzentruber added the disableSeccomp capability to have a more confined way of enabling performance counters rather than when using the docker-worker privileged capability.

@jschwartzentruber can you confirm?

This might be a combination I never tested. I was going from the rr wiki Docker page, which recommends --cap-add=SYS_PTRACE --security-opt seccomp=unconfined, but now I see in the rr FAQ that they recommend only --privileged. So it may be that we don't need seccomp unconfined at all, if we can just run with privilege.

@petemoore
Copy link
Member Author

petemoore commented Nov 29, 2022

This might be a combination I never tested. I was going from the rr wiki Docker page, which recommends --cap-add=SYS_PTRACE --security-opt seccomp=unconfined, but now I see in the rr FAQ that they recommend only --privileged. So it may be that we don't need seccomp unconfined at all, if we can just run with privilege.

fwiw SYS_PTRACE alone isn't enough for collecting performance metrics (not sure whether rr needs those?):

ubuntu@ip-172-31-1-54:~$ sudo docker run --cap-add SYS_PTRACE --rm -ti alpine /bin/sh -c 'whoami && apk add perf; perf stat ls'
root
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.17/community/x86_64/APKINDEX.tar.gz
(1/9) Installing libgcc (12.2.1_git20220924-r4)
(2/9) Installing libstdc++ (12.2.1_git20220924-r4)
(3/9) Installing binutils (2.39-r2)
(4/9) Installing libcap2 (2.66-r0)
(5/9) Installing libbz2 (1.0.8-r4)
(6/9) Installing musl-fts (1.2.7-r3)
(7/9) Installing xz-libs (5.2.8-r0)
(8/9) Installing libelf (0.187-r2)
(9/9) Installing perf (5.15.74-r0)
Executing busybox-1.35.0-r29.trigger
OK: 28 MiB in 24 packages
Error:
No permission to enable task-clock event.

so indeed if you are ok with using privileged in these tasks, then we won't need to alter the system-wide /proc/sys/kernel/perf_event_paranoid setting on workers, which is better for us. However, specifying --privileged opens up a larger attack surface to your tasks than a finely tuned seccomp profile would do, so it might be worth creating a seccomp profile at some point, and adding syscalls one by one until you've found all the required calls needed, and in future when running under generic-worker, if docker is still used, that seccomp profile could be specified in the docker run command.

@petemoore petemoore force-pushed the issue5799 branch 2 times, most recently from 8a51d9b to 6da6c15 Compare November 29, 2022 20:33
@petemoore petemoore marked this pull request as ready for review November 29, 2022 20:34
@petemoore petemoore requested a review from a team as a code owner November 29, 2022 20:34
@petemoore petemoore requested review from lotas and matt-boris and removed request for a team November 29, 2022 20:34
@matt-boris
Copy link
Contributor

Successful rerun of docker worker chunk 4 here https://community-tc.services.mozilla.com/tasks/G-b1mZ_rTbyrTCFJl3KPUw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants