Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow running buildah on newer nodes #806

Merged
merged 2 commits into from Jun 24, 2022

Conversation

ibotty
Copy link
Contributor

@ibotty ibotty commented May 24, 2022

Changes

This will allow running buildah on nodes with kernel 5.12 or later.

Buildah requires the SETFCAP capability to work. This pull requests updates OpenShift SCC to allow adding this capability and the included buildah clustertask to request it.

I do not have access to clusters with older kernels so I don't know whether it will work on these nodes.

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

See the contribution guide for more details.

Release Notes

buildah ClusterTask is now allowed SETFCAP capability

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 24, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: ibotty / name: Tobias Florek (025b581)

@tekton-robot tekton-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label May 24, 2022
@tekton-robot
Copy link
Contributor

Hi @ibotty. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 24, 2022
@ibotty
Copy link
Contributor Author

ibotty commented May 24, 2022

@ibotty: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

No release notes are necessary (afaict), so what should I do?

@nikhil-thomas
Copy link
Member

/ok-to-test

@tekton-robot tekton-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 25, 2022
@nikhil-thomas
Copy link
Member

@gabemontero what are your thoughts 🧑‍💻

@nikhil-thomas
Copy link
Member

@ibotty: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

No release notes are necessary (afaict), so what should I do?

i think a 1 line release will help our users to be aware of this change.

but to get rid of the block without adding a release-note add

```release-note
NONE
\```

at the end of our pr description.

@concaf
Copy link
Contributor

concaf commented May 25, 2022

a release note like this will help our users:

buildah ClusterTask is now allowed SETFCAP capability

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 25, 2022
@ibotty
Copy link
Contributor Author

ibotty commented May 25, 2022

I added the release notes as suggested. You are certainly right.

BTW: I plan to do a pull request in tektoncd's catalog when the review here is done.

@concaf
Copy link
Contributor

concaf commented May 25, 2022

/assign @piyush-garg

@nikhil-thomas
Copy link
Member

@piyush-garg do we need to add similar securityContext blocks to all the s2i tasks in our addons here

@piyush-garg
Copy link
Contributor

@ibotty is this PR tested on any OpenShift cluster? I am not able to run the tasks as a normal user

@ibotty
Copy link
Contributor Author

ibotty commented May 25, 2022

Yes, after manually changing the scc and copying the buildah-task to add the changes I can run it on OKD 4.8. I did not built a different operator and let it change the crds though.

@gabemontero
Copy link

ok @ibotty @nikhil-thomas @piyush-garg

  • first, yes cap_setfcap was the additional permission that @nalind said was needed with the latest coreos changes on 4.11 and so on the surface this looks like the approach, though yeah, as was mentioned in slack, any of the s2i items in the catalog that are using buildah would need this too
  • for background, here are what the man pages at https://man7.org/linux/man-pages/man7/capabilities.7.html say about it:
CAP_SETFCAP (since Linux 2.6.24)
              Set arbitrary capabilities on a file.

              Since Linux 5.12, this capability is also needed to map
              user ID 0 in a new user namespace; see [user_namespaces(7)](https://man7.org/linux/man-pages/man7/user_namespaces.7.html)
              for details.

for @nikhil-thomas 's question to myself, @adambkaplan, and @sbose78 about the security implications, I'm not sure I feel qualified to respond with confidence on how good or bad that additional change is, but I believe @nalind did tell me this was not a temporary state of affairs. That it was what would be needed moving forward.

@nalind - do you have any input on the relative safety or lack there of around CAP_SETFCAP

  • lastly, when @nalind and I last talked, he mentioned he had buildah changes he needed to make to get things to work even with CAP_SETFCAP. @nalind - were you in fact able to make those changes yet? @nikhil-thomas said they are now using registry.redhat.io/rhel8/buildah@sha256:23fb7971ea6ac4aaaaa1139473a602df0df19222a3b5a76b551b2b9ddd92e927 and that it is working. Just wanted to make sure that lines up with your expectations.

that's all I got - thanks

@gabemontero
Copy link

note: as I understand it, "the problem" and need for this new permission only applies to running on OCP 4.11

@nikhil-thomas
Copy link
Member

/hold

let us get some more 👀 on this.
🧑‍💻

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 25, 2022
@piyush-garg
Copy link
Contributor

@ibotty is this PR tested on any OpenShift cluster? I am not able to run the tasks as a normal user

It did not work for me as a normal user, but worked as a kubeadmin

@ibotty
Copy link
Contributor Author

ibotty commented May 25, 2022

@piyush-garg: Are you sure you the pipelines-scc got updated? Mine looks like

oc get scc pipelines-scc
NAME            PRIV    CAPS          SELINUX     RUNASUSER   FSGROUP     SUPGROUP   PRIORITY   READONLYROOTFS   VOLUMES
pipelines-scc   false   ["SETFCAP"]   MustRunAs   RunAsAny    MustRunAs   RunAsAny   10         false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]

@ibotty
Copy link
Contributor Author

ibotty commented May 25, 2022

note: as I understand it, "the problem" and need for this new permission only applies to running on OCP 4.11

And on OKD since, I guess, forever.

@nalind
Copy link

nalind commented May 25, 2022

for @nikhil-thomas 's question to myself, @adambkaplan, and @sbose78 about the security implications, I'm not sure I feel qualified to respond with confidence on how good or bad that additional change is, but I believe @nalind did tell me this was not a temporary state of affairs. That it was what would be needed moving forward.

That's right, I have no reason to expect the kernel to revert this.

@nalind - do you have any input on the relative safety or lack there of around CAP_SETFCAP

I'm not an expert on that.

* lastly, when @nalind and I last talked, he mentioned he had buildah changes he needed to make to get things to work even with CAP_SETFCAP.  @nalind - were you in fact able to make those changes yet?  @nikhil-thomas said they are now using `registry.redhat.io/rhel8/buildah@sha256:23fb7971ea6ac4aaaaa1139473a602df0df19222a3b5a76b551b2b9ddd92e927` and that it is working.  Just wanted to make sure that lines up with your expectations.

I must have misspoken. The most we can do there is check for that capability (near the existing check for CAP_SYS_ADMIN) and suggest it as a possible cause when we fail to set up the ID mappings.

@ibotty
Copy link
Contributor Author

ibotty commented May 25, 2022

Argh! I forgot to mention, I am using the task with BUILDER_IMAGE=quay.io/buildah/stable:latest, because of missing redhat registry credentials.

@adambkaplan
Copy link

It should be noted that CAP_SETFCAP is permitted under the Baseline pod security standard:

https://kubernetes.io/docs/concepts/security/pod-security-standards/#baseline

@gabemontero
Copy link

It should be noted that CAP_SETFCAP is permitted under the Baseline pod security standard:

https://kubernetes.io/docs/concepts/security/pod-security-standards/#baseline

good catch @adambkaplan forgot to look there

yes that is probably the best answer to say it is "OK" for pipeline SA to pick this up

@piyush-garg
Copy link
Contributor

@piyush-garg: Are you sure you the pipelines-scc got updated? Mine looks like

oc get scc pipelines-scc
NAME            PRIV    CAPS          SELINUX     RUNASUSER   FSGROUP     SUPGROUP   PRIORITY   READONLYROOTFS   VOLUMES
pipelines-scc   false   ["SETFCAP"]   MustRunAs   RunAsAny    MustRunAs   RunAsAny   10         false            ["configMap","downwardAPI","emptyDir","persistentVolumeClaim","projected","secret"]

Yes, i see it like this and not able to create pod. taskrun not able to convert to pod

@piyush-garg
Copy link
Contributor

@ibotty This PR #813 worked for me

@sm43
Copy link
Member

sm43 commented Jun 7, 2022

I tried this pr on OCP 4.11
by creating a non admin user too
got error as below for both users (admin and nonadmin users)

failed to create task run pod "buildah-run-kqc8j-run-buildah":
              pods "buildah-run-kqc8j-run-buildah-pod" is forbidden: unable to validate
              against any security context constraint: [provider "anyuid": Forbidden:
              not usable by user or serviceaccount, spec.containers[0].securityContext.capabilities.add:
              Invalid value: "CAP_SETFCAP": capability may not be added, spec.containers[1].securityContext.capabilities.add:
              Invalid value: "CAP_SETFCAP": capability may not be added, spec.containers[2].securityContext.capabilities.add:
              Invalid value: "CAP_SETFCAP": capability may not be added, provider
              "restricted": Forbidden: not usable by user or serviceaccount, provider
              "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider
              "hostmount-anyuid": Forbidden: not usable by user or serviceaccount,
              provider "machine-api-termination-handler": Forbidden: not usable by
              user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable
              by user or serviceaccount, provider "hostnetwork": Forbidden: not usable
              by user or serviceaccount, provider "hostaccess": Forbidden: not usable
              by user or serviceaccount, provider "node-exporter": Forbidden: not
              usable by user or serviceaccount, provider "privileged": Forbidden:
              not usable by user or serviceaccount]. Maybe missing or invalid Task
              test/buildah

steps performed


pods created are using pipeline-scc

@ibotty
Copy link
Contributor Author

ibotty commented Jun 7, 2022

Argh. While trying to debug the problem I noticed, that I did make a copy-paste error: the capabilities are not prepended with CAP_ in kubernetes. My "fixed" task had addCapabilities: [SETFCAP] set.
🤦

Should I update the PR to add (the right) capability to all other s2i tasks as well?


- name: digest-to-results
image: $(params.BUILDER_IMAGE)
script: cat $(workspaces.source.path)/image-digest | tee /tekton/results/IMAGE_DIGEST
securityContext:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need to add the capability for this step

@piyush-garg
Copy link
Contributor

Argh. While trying to debug the problem I noticed, that I did make a copy-paste error: the capabilities are not prepended with CAP_ in kubernetes. My "fixed" task had addCapabilities: [SETFCAP] set. facepalm

Should I update the PR to add (the right) capability to all other s2i tasks as well?

Aah thanks, i also misunderstood that we need to add the capabilities as like in linux but it is different for containers.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2022
@piyush-garg
Copy link
Contributor

Should I update the PR to add (the right) capability to all other s2i tasks as well?

Yes we need to do the change for all s2i tasks too, but only for the build and push steps, not all

@tekton-robot tekton-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 7, 2022
@sbose78
Copy link

sbose78 commented Jun 23, 2022

Followed up, looks good 👍

@vdemeester
Copy link
Member

/lgtm
I think we can lift the hold

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2022
@sm43
Copy link
Member

sm43 commented Jun 24, 2022

/lgtm
/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 24, 2022
@tekton-robot tekton-robot merged commit 71a7770 into tektoncd:main Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet