Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode scripts as base 64 to avoid k8s mangling "$$" #3963

Merged
merged 1 commit into from Jun 14, 2021
Merged

Encode scripts as base 64 to avoid k8s mangling "$$" #3963

merged 1 commit into from Jun 14, 2021

Conversation

ghost
Copy link

@ghost ghost commented May 20, 2021

Changes

Fixes #3871

Kubernetes replaces instances of "$$" in container args fields with "$". This can muck up the contents of script fields because scripts are passed into a TaskRun Pod as an arg to an init container. In particular, $$ is used for the current process id in bash scripts.

Prior to this commit we tried to prevent the replacement from happening by:

  1. putting scripts into annotations on a pod and projecting them using downward
    API

    • con: the max size of the annotations map is capped to ~250kB. The aggregate size of all scripts in a single Task therefore becomes constrained by this. Any other systems using annotations will reduce the available headroom. Backwards incompatible.
  2. replacing instances of "$$" in scripts with "$$$$" for k8s to then process
    back to "$$"

    • con: k8s doesn't actually process all instances of "$$". For example, if you write an arg with format "echo $(eval $$foo)" then k8s will see the first "$(", assume it's a variable reference, and pass it through verbatim. So user's scripts with bash variable become broken by tekton's new replacement. Backwards incompatible.

This commit takes a third approach, proposed by @MartinKanters, encoding scripts as base64 in the controller and then having them decoded in the init container. This bypasses Kubernetes' args processing completely because dollar
signs aren't used in the default base64 alphabet.

This approach avoids introducing a backwards-incompatible limit to the aggregate script size and doesn't mangle existing bash scripts with variable replacements.

The most noticeable trade-offs we now make are:

  1. Tiny scripts can be up to 300% bigger, but as scripts get longer the max increase gets closer to 133%.
  2. Also the TaskRun's initContainer YAML is a bit less human readable:
    initContainers:
    - args:
      - -c
      - |
        tmpfile="/tekton/scripts/script-0-f8fmf"
        touch ${tmpfile} && chmod +x ${tmpfile}
        cat > ${tmpfile} << '_EOF_'
        IyEvYmluL3NoCnNldCAteGUKZWNobyAibm8gc2hlYmFuZyI=
        _EOF_
        /tekton/tools/entrypoint decode-script "${tmpfile}"

The entrypoint is extended to decode base64 files so that the shellImage
(which is used to write scripts to disk for Step containers) is not required
to package a base64 binary.

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Scripts in Tasks are now written into the Task's pod using base64 to avoid kubernetes' built-in arg processing. This means they're a little larger than they were prior to this release but otherwise should continue working as expected.

@tekton-robot tekton-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label May 20, 2021
@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 20, 2021
@ghost ghost requested a review from imjasonh May 20, 2021 15:18
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands.go Do not exist 0.0%

@ghost
Copy link
Author

ghost commented May 20, 2021

/kind bug

@tekton-robot tekton-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 20, 2021
@ghost ghost added this to the Pipelines v0.25 milestone May 20, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands.go Do not exist 0.0%

@ghost
Copy link
Author

ghost commented May 20, 2021

/test tekton-pipeline-unit-tests

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands.go Do not exist 0.0%

@ghost
Copy link
Author

ghost commented May 20, 2021

TODO:

  • Split decodeScript into a few pieces: decodeFromFile, writeFile
  • Add unit tests for decodeFromFile (maybe using new fs package?)

@tekton-robot tekton-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 21, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 0.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 56.5%
cmd/entrypoint/subcommands/subcommands.go Do not exist 0.0%

# Test that bash scripts with variable evaluations work as expected
- name: bash-variable-evaluations
image: bash:5.1.8
workingDir: /workspace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this workspace needed here? (also for the v1beta1 version)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch, it isn't needed when running as root. I think I was playing around with different runAsUser settings and added this for non-root. Removed from both examples, cheers!

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 0.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 56.5%
cmd/entrypoint/subcommands/subcommands.go Do not exist 0.0%

@ghost
Copy link
Author

ghost commented May 22, 2021

/test pull-tekton-pipeline-alpha-integration-tests

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 0.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 56.5%
cmd/entrypoint/subcommands/subcommands.go Do not exist 0.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 0.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 56.5%
cmd/entrypoint/subcommands/subcommands.go Do not exist 0.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 0.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 56.5%
cmd/entrypoint/subcommands/subcommands.go Do not exist 0.0%

@ghost
Copy link
Author

ghost commented May 22, 2021

/test pull-tekton-pipeline-build-tests

@ghost
Copy link
Author

ghost commented May 24, 2021

/test pull-tekton-pipeline-integration-tests

}
return SubcommandSuccessful{message: fmt.Sprintf("Decoded script %s", src)}
}
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean invalid args are ignored? (or does this just stop it from processing something that doesnt look like a subcommand?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just trying to capture the fact that the default operation is to "do nothing, return nil".

@bobcatfish
Copy link
Collaborator

ugh this was a surprisingly tricky one!!! ps. thanks for the FABULOUS commit message 😍 !!!

k8s doesn't actually process all instances of "$$"

🙃

image

feels like these could potentially benefit from some test coverage, what do you think? im thinking especially subcommands

just lemme know if you feel its not worth it, i noticed you included unit tests for decode script so maybe you felt it wasnt worth it to maintain unit tests for these other tiny files - i think it's the indexing into the args object in Process that has me really feeling like "maybe some unit tests"

Copy link
Author

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah good point, I added tests for decodeScript because I wanted to try out the new fstest built-in package, unfortunately it only lets you mock a read-only filesystem so I didn't bother writing additional new tests for cp and subcommands as they both write. I'll add some for these using a TempDir.

}
return SubcommandSuccessful{message: fmt.Sprintf("Decoded script %s", src)}
}
default:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just trying to capture the fact that the default operation is to "do nothing, return nil".

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 90.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 56.5%
cmd/entrypoint/subcommands/subcommands.go Do not exist 0.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 90.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 56.5%
cmd/entrypoint/subcommands/subcommands.go Do not exist 56.2%

@chmouel
Copy link
Member

chmouel commented Jun 1, 2021

Pretty neat @sbwsg and thanks for the great commit message, did base-122 was investigated for size?

This blog post details it, https://blog.kevinalbs.com/base122 and the go library from the same author has as well some details https://github.com/kevinAlbs/Base122

I was wondering if the argument had a size limit, but it seems that k8/container runtime is limited by what the linux kernel is providing, which is pretty much unlimited as the system provide it, from the manpage https://linux.die.net/man/2/execve :

On kernel 2.6.23 and later, most architectures support a size limit derived from the soft RLIMIT_STACK resource limit (see getrlimit(2)) that is in force at the time of the execve() call. (Architectures with no memory management unit are excepted: they maintain the limit that was in effect before kernel 2.6.23.) This change allows programs to have a much larger argument and/or environment list. For these architectures, the total size is limited to 1/4 of the allowed stack size. (Imposing the 1/4-limit ensures that the new program always has some stack space.) Since Linux 2.6.25, the kernel places a floor of 32 pages on this size limit, so that, even when RLIMIT_STACK is set very low, applications are guaranteed to have at least as much argument and environment space as was provided by Linux 2.6.23 and earlier. (This guarantee was not provided in Linux 2.6.23 and 2.6.24.) Additionally, the limit per string is 32 pages (the kernel constant MAX_ARG_STRLEN), and the maximum number of strings is 0x7FFFFFFF.

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2021
@ghost
Copy link
Author

ghost commented Jun 7, 2021

Cheers @chmouel I hadn't considered using a different encoding like Base 122! My only hesitation is that the design and code are controlled by a single person it looks like, and it sounds from the readme that the author still considers the design a work-in-progress.

Given that we're not exposing the encoding as part of our API we could always iterate on it and try out better / more compact representations over time if users complain about size or other issues.

@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2021
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 90.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 82.6%
cmd/entrypoint/subcommands/subcommands.go Do not exist 75.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 90.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 80.8%
cmd/entrypoint/subcommands/subcommands.go Do not exist 75.0%

Copy link
Member

@afrittoli afrittoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, it looks good - and better than all previous options.
Great PR description!!
/approve

set -xe
var1=var1_value
var2=var1
echo $(eval echo \$$var2) > tmpfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, nice :)

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2021
@chmouel
Copy link
Member

chmouel commented Jun 11, 2021

/lgtm

thanks @sbwsg , let's hope this fix our variables encoding issues 🤞🏻

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2021
@ghost
Copy link
Author

ghost commented Jun 11, 2021

oops. I'm missing a file truncate somewhere in the switch back to the os package.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [afrittoli,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vdemeester
Copy link
Member

/retest

@tekton-robot tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2021
@ghost
Copy link
Author

ghost commented Jun 11, 2021

OK, decoded script files should be correctly truncated now. With my previous change the script file was ending up as the decoded script + the tail of the base64 string.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 90.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 82.6%
cmd/entrypoint/subcommands/subcommands.go Do not exist 75.0%

@@ -0,0 +1,30 @@
package subcommands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this refactoring ❤️ awesome work @sbwsg 👏

@@ -0,0 +1,30 @@
package subcommands
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing copyright license headers 😜

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks @pritidesai ; added the license headers.

Kubernetes replaces instances of "$$" in container args fields with "$". This
can muck up the contents of script fields because scripts are passed into a
TaskRun Pod as an arg to an init container.

Prior to this commit we [tried to prevent the
replacement](#3888) from happening by:

1. putting scripts into annotations on a pod and projecting them using downward
API
    - con: the max size of the annotations map is capped to ~250kB. The
      aggregate size of all scripts in a single Task therefore becomes
      constrained by this. Any other systems using annotations will reduce the
      available headroom. Backwards incompatible.

2. replacing instances of "$$" in scripts with "$$$$" for k8s to then process
back to "$$"
    - con: k8s doesn't actually process _all_ instances of "$$". For example,
      if you write an arg with format "echo $(eval \$$foo)" then k8s will see
the first "$(", assume it's a variable reference, and pass it through verbatim.
So user's scripts with bash variable become broken by tekton's new replacement.
Backwards incompatible.

This commit takes a third approach, proposed by @MartinKanters, encoding
scripts as base64 in the controller and then having them decoded in the init
container. This bypasses Kubernetes' args processing completely because dollar
signs aren't used in base64 encodings. It also doesn't introduce a
backwards-incompatible limit to the aggregate script size. And it doesn't
mangle existing bash scripts with variable replacements.

The most noticeable trade-offs we now make are:

1. Tiny scripts can be up to 300% bigger, but as scripts get longer the max
increase gets closer to 133%.
2. Also the TaskRun's `initContainer` YAML is a bit less human readable:

```
    initContainers:
    - args:
      - -c
      - |
        tmpfile="/tekton/scripts/script-0-f8fmf"
        touch ${tmpfile} && chmod +x ${tmpfile}
        cat > ${tmpfile} << '_EOF_'
        IyEvYmluL3NoCnNldCAteGUKZWNobyAibm8gc2hlYmFuZyI=
        _EOF_
        /tekton/tools/entrypoint decode-script "${tmpfile}"
```

The entrypoint is extended to decode base64 files so that the `shellImage`
(which is used to write scripts to disk for Step containers) is not required
to package a `base64` binary.
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
cmd/entrypoint/subcommands/cp.go Do not exist 90.0%
cmd/entrypoint/subcommands/decode_script.go Do not exist 82.6%
cmd/entrypoint/subcommands/subcommands.go Do not exist 75.0%

@jerop
Copy link
Member

jerop commented Jun 14, 2021

/test pull-tekton-pipeline-integration-tests

@ghost
Copy link
Author

ghost commented Jun 14, 2021

/test pull-tekton-pipeline-alpha-integration-tests

initContainer.Args[1] += fmt.Sprintf(`tmpfile="%s"
touch ${tmpfile} && chmod +x ${tmpfile}
cat > ${tmpfile} << '%s'
%s
%s
/tekton/tools/entrypoint decode-script "${tmpfile}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: can we use something more meaningful here instead of tmpfile, may be a script file or a encoded script file? looks odd to see a tmpfile in pod description:

kubectl get pod/dollar-dollar-taskrun-lztnb-pod-tvs2h -o json |  jq .spec.initContainers
...
{
    "args": [
      "-c",
      "tmpfile=\"/tekton/scripts/script-0-ck986\"\ntouch ${tmpfile} && chmod +x ${tmpfile}\ncat > ${tmpfile} << '_EOF_'\nIyEvYmluL3NoCnNldCAteGUKZWNobyAndHdvIGRvbGxhciBzaWduczogJCQnCmVjaG8gJ2ZvdXIgZG9sbGFyIHNpZ25zOiAkJCQkJwplY2hvICd0d28gZG9sbGFyIHNpZ25zIGZyb20gYW4gZW52aXJvbm1lbnQgdmFyaWFibGU6ICcgJERPTExBUl9ET0xMQVJfRU5WCg==\n_EOF_\n/tekton/tools/entrypoint decode-script \"${tmpfile}\"\n"
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, created #4041

@pritidesai
Copy link
Member

Also the TaskRun's initContainer YAML is a bit less human readable:

It's a very strong disadvantage but there is no way around to deal with such special characters.

thanks @sbwsg

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 14, 2021
@tekton-robot tekton-robot merged commit 584b527 into tektoncd:main Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tekton converts double dollar symbols to one
8 participants