Make generated pods only request the maximum necessary resources #723

dwnusbaum · 2019-04-04T18:28:04Z

Changes

Before this change, if CPU, memory, or ephemeral storage resource requests were set in a Task's steps (which are Containers), the generated Pod would require the sum of all of the steps' requests to be scheduled on a Node. However, because Tekton overwrites Container entrypoints in Tasks to make the Containers logically execute one at a time, we want to make Pods generated by the TaskRun only request the maximum resources that will be necessary for any single Container rather than the sum of all resource requests.

To make this happen, when generating a Pod for a Task, we find the Step with largest resource requests among all Steps, and set the resource requests for all other steps to 0 for the respective resource. If no Step has an explicit resource request, all requests are set to 0. If we unset resource requests instead of setting them to 0 explicitly, then the limits would be used for the requests, which would defeat the purpose of unsetting the requested values (and could end up making the Pod request more memory than it did in the first place).

I did not add any e2e tests, but would be happy to do so if desired by reviewers. I think the tests would need to look up Nodes in the cluster to find the maximum allowed resources and then create some tasks dynamically that would have been unschedulable before this change but work after the change.

CC @bbrowning, @abayer

Fixes #598

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

Includes tests (if functionality changed/added)
Includes docs (if user facing)
Commit messages follow commit message best practices

See the contribution guide
for more details.

Release Notes

Set CPU, memory, and ephemeral storage resource requests for each
step in a Task to zero if the step does not have the largest resource
request out of all steps in the Task. This ensures that the Pod that
executes the Task will only request the resources necessary to
execute any single step in the Task, rather than requesting the sum
of all of the step's resource requests, and is safe because Tekton
executes steps sequentially rather than all at once.

pkg/reconciler/v1alpha1/taskrun/resources/pod.go

dwnusbaum · 2019-04-04T18:30:43Z

pkg/reconciler/v1alpha1/taskrun/resources/pod_test.go

-	ignorePrivateResourceFields = cmpopts.IgnoreUnexported(resource.Quantity{})
-	nopContainer                = corev1.Container{
+	resourceQuantityCmp = cmp.Comparer(func(x, y resource.Quantity) bool {
+		return x.Cmp(y) == 0


I added this comparator in 3 places. It would be nice to have a default comparator with some helpful default settings for use in tests, but I didn't see any kind of util package or anything for that kind of thing. Is there a place for something like that?

dwnusbaum · 2019-04-04T18:31:42Z

test/builder/container.go

+	}
+}
+
+func Memory(val string) ResourceListOp {


Alternatively we could just have func Resource(name corev1.ResourceName, val string) so we don't need a separate method for every function, but that would be a little more verbose in the tests.

dwnusbaum · 2019-04-04T19:16:18Z

/test pull-tekton-pipeline-integration-tests

dwnusbaum · 2019-04-04T20:42:08Z

I guess the tests are failing for the reason Nader described here?

/test pull-tekton-pipeline-integration-tests

dwnusbaum · 2019-04-05T13:37:56Z

/test pull-tekton-pipeline-integration-tests

vdemeester

Desgin wise I'm torn between:

the current implement, that let the Max request of each type on it's original container (step)
an implementation where we would get the Max for each and apply to the first container (step)

The later makes it easier to know where to look for those resource requests.

@dwnusbaum @bobcatfish wdyt ? 🚴‍♂️

dwnusbaum · 2019-04-05T15:53:40Z

@vdemeester No preference from me, it would be relatively easy to switch to applying the max requests to the first container. It might be a bit cleaner to do things that way rather than needing to track indices.

dwnusbaum · 2019-04-05T17:32:02Z

@vdemeester Actually I think moving the max requests to the first container is problematic because of resource limits. Take the following example:

steps:
  - name: step1
    ...
    resources:
      limits:
        memory: "1Gi"
      requests:
        memory: "1Gi"
  - name: step2
    ...
    resources:
      requests:
        memory: "5Gi"

The current approach turns this into:

steps:
  - name: step1
    ...
    resources:
      limits:
        memory: "1Gi"
      requests:
        memory: "0"
  - name: step2
    ...
    resources:
      requests:
        memory: "5Gi"

But the other approach would cause the first step to request resources beyond its limit:

steps:
  - name: step1
    ...
    resources:
      limits:
        memory: "1Gi"
      requests:
        memory: "5Gi"
  - name: step2
    ...
    resources:
      requests:
        memory: "0"

We could adjust the limit in these cases, but I'd prefer to avoid modifying limits if we don't have to.

vdemeester · 2019-04-05T17:39:29Z

@vdemeester Actually I think moving the max requests to the first container is problematic because of resource limits. Take the following example:

[…]

But the other approach would cause the first step to request resources beyond its limit:

[…]

We could adjust the limit in these cases, but I'd prefer to avoid modifying limits if we don't have to.

Ah ! Make sense 😅 Let's go for the initial take then 👼

dwnusbaum · 2019-04-05T18:20:17Z

docs/tasks.md

@@ -129,6 +129,12 @@ or container images that you define:
  the configuration file.
 - Each container image runs until completion or until the first failure is
  detected.
+- The CPU, memory, and ephemeral storage resource requests will be set to zero


Any thoughts on a better place to put this documentation or a better way to phrase it would be welcome!

vdemeester

/lgtm
/hold

Waiting for @abayer and/or @bobcatfish review 👼 🙏

abayer · 2019-04-11T15:43:04Z

/lgtm
/cancel hold

abayer · 2019-04-11T15:43:59Z

/hold cancel

...forgot the syntax. =)

dwnusbaum · 2019-04-11T15:50:28Z

Looks like the unit tests are failing after other PRs were merged (maybe #748?), probably a logical conflict somewhere with the changes in this PR. I'll rebase and fix the tests.

vdemeester · 2019-04-11T15:50:58Z

arf sorry @dwnusbaum 😓 🙇‍♂️

dwnusbaum · 2019-04-11T15:51:46Z

@vdemeester No problem, probably trivial to fix 😄

Before this change, if CPU, memory, or ephemeral storage resource requests were set in a Task's steps (which are Containers), the generated Pod would require the sum of all of the steps' requests to be scheduled on a Node. However, because Tekton overwrites Container entrypoints in Tasks to make the Containers logically execute one at a time, we want to make Pods generated by the TaskRun only request the maximum resources that will be necessary for any single Container rather than the sum of all resource requests. To make this happen, when generating a Pod for a Task, we find the Step with largest resource requests among all Steps, and set the resource requests for all other steps to 0 for the respective resource. If no Step has an explicit resource request, all requests are set to 0. If we unset resource requests instead of setting them to 0 explicitly, then the limits would be used for the requests, which would defeat the purpose of unsetting the requested values (and could end up making the Pod request more memory than it did in the first place). Fixes tektoncd#598

vdemeester

/lgtm

tekton-robot · 2019-04-12T17:01:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dwnusbaum, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

abayer · 2019-04-12T17:18:05Z

/lgtm

bobcatfish

Beautiful commit message!!! a8c6493 ❤️ 😻 ❤️

I did not add any e2e tests

Totally reasonable imo! We often add too many end to end tests tbh

bobcatfish · 2019-04-15T21:41:41Z

pkg/reconciler/v1alpha1/taskrun/resources/pod.go

+// one at a time, so we want pods to only request the maximum resources needed
+// at any single point in time. If no contianer has an explicit resource
+// request, all requests are set to 0.
+func zeroNonMaxResourceRequests(container *corev1.Container, containerIndex int, maxIndicesByResource map[corev1.ResourceName]int) {


these functions are excellent! nice focused interface and clear short functions. My only too-late-to-the-party request would be to have unit tests covering these functions directly as well but im fully expecting to be ignored since im so late haha :D

Good point! I'm going to file an issue for a followup I thought of the other day, and if I'm in this area again I'll add some unit tests.

awesome, that sounds great @dwnusbaum ❤️ !!

tekton-robot requested review from imjasonh and vdemeester April 4, 2019 18:28

googlebot added the cla: yes Trying to make the CLA bot happy with ppl from different companies work on one commit label Apr 4, 2019

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 4, 2019

dwnusbaum commented Apr 4, 2019

View reviewed changes

dwnusbaum force-pushed the container-resource-limits branch from e9649ca to be66897 Compare April 5, 2019 15:40

vdemeester reviewed Apr 5, 2019

View reviewed changes

dwnusbaum force-pushed the container-resource-limits branch 2 times, most recently from db2eab3 to d9c4eb1 Compare April 5, 2019 17:45

dwnusbaum commented Apr 5, 2019

View reviewed changes

vdemeester approved these changes Apr 6, 2019

View reviewed changes

tekton-robot assigned vdemeester Apr 6, 2019

tekton-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 6, 2019

tekton-robot assigned abayer Apr 11, 2019

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 11, 2019

dwnusbaum force-pushed the container-resource-limits branch from d9c4eb1 to a8c6493 Compare April 12, 2019 16:57

tekton-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 12, 2019

vdemeester approved these changes Apr 12, 2019

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 12, 2019

tekton-robot merged commit 9f52dad into tektoncd:master Apr 12, 2019

bobcatfish reviewed Apr 15, 2019

View reviewed changes

dwnusbaum deleted the container-resource-limits branch April 16, 2019 13:45

qu1queee mentioned this pull request Jun 2, 2020

Spike: Investigate how to define the resource limits for a pod instead of a container shipwright-io/build#239

Closed

skaegi mentioned this pull request Apr 12, 2023

Step Resource "Request" Distribution Regression #6525

Open

NEUerYZY mentioned this pull request Sep 25, 2023

Want to know the policy that Tekton request resource for the Task with multi containers #7148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make generated pods only request the maximum necessary resources #723

Make generated pods only request the maximum necessary resources #723

dwnusbaum commented Apr 4, 2019 •

edited

Loading

dwnusbaum Apr 4, 2019

dwnusbaum Apr 4, 2019 •

edited

Loading

dwnusbaum commented Apr 4, 2019

dwnusbaum commented Apr 4, 2019

dwnusbaum commented Apr 5, 2019

vdemeester left a comment

dwnusbaum commented Apr 5, 2019

dwnusbaum commented Apr 5, 2019

vdemeester commented Apr 5, 2019

dwnusbaum Apr 5, 2019

vdemeester left a comment

abayer commented Apr 11, 2019

abayer commented Apr 11, 2019

dwnusbaum commented Apr 11, 2019

vdemeester commented Apr 11, 2019

dwnusbaum commented Apr 11, 2019

vdemeester left a comment

tekton-robot commented Apr 12, 2019

abayer commented Apr 12, 2019

bobcatfish left a comment

bobcatfish Apr 15, 2019

dwnusbaum Apr 16, 2019

bobcatfish Apr 16, 2019

Make generated pods only request the maximum necessary resources #723

Make generated pods only request the maximum necessary resources #723

Conversation

dwnusbaum commented Apr 4, 2019 • edited Loading

Changes

Submitter Checklist

Release Notes

dwnusbaum Apr 4, 2019

Choose a reason for hiding this comment

dwnusbaum Apr 4, 2019 • edited Loading

Choose a reason for hiding this comment

dwnusbaum commented Apr 4, 2019

dwnusbaum commented Apr 4, 2019

dwnusbaum commented Apr 5, 2019

vdemeester left a comment

Choose a reason for hiding this comment

dwnusbaum commented Apr 5, 2019

dwnusbaum commented Apr 5, 2019

vdemeester commented Apr 5, 2019

dwnusbaum Apr 5, 2019

Choose a reason for hiding this comment

vdemeester left a comment

Choose a reason for hiding this comment

abayer commented Apr 11, 2019

abayer commented Apr 11, 2019

dwnusbaum commented Apr 11, 2019

vdemeester commented Apr 11, 2019

dwnusbaum commented Apr 11, 2019

vdemeester left a comment

Choose a reason for hiding this comment

tekton-robot commented Apr 12, 2019

abayer commented Apr 12, 2019

bobcatfish left a comment

Choose a reason for hiding this comment

bobcatfish Apr 15, 2019

Choose a reason for hiding this comment

dwnusbaum Apr 16, 2019

Choose a reason for hiding this comment

bobcatfish Apr 16, 2019

Choose a reason for hiding this comment

dwnusbaum commented Apr 4, 2019 •

edited

Loading

dwnusbaum Apr 4, 2019 •

edited

Loading