[TEP-0076] Propose array results and indexing 🐠 #477

bobcatfish · 2021-07-15T22:43:30Z

This TEP proposes adding more support for array params by adding
array results as well as the ability to index into arrays.

It refers to TEP-0075 which will be added in a separate commit, which
proposes adding support for dictionary types.

Related issues:

/kind tep

bobcatfish · 2021-07-16T16:05:03Z

p.s. why 76 you might ask? what happened to 74 and 75 XD? well you see, i started working on 74, then i realized i needed 75 (dictionary params + results) and THEN i realized i needed 76, so here we are 😆 (should see 74 + 75 as well asap!!)

teps/0076-array-result-types.md

chhsia0 · 2021-07-19T16:08:44Z

/assign @sbwsg @pierretasci

teps/0076-array-result-types.md

pierretasci · 2021-07-20T21:29:05Z

teps/0076-array-result-types.md

+
+* What if the file contains json we don't support? (e.g. we don't yet support dictionaries, and wouldn't
+  initially support arrays of arrays)
+    * The TaskRun (or Pipeline Task) would fail


We could at least partially validate this on the creation of the resource too

could you explain a bit more? I'm not sure how we'd know until runtime that something was written to the file in an unexpected format

Yeah that is what I mean. We can validate it at runtime and potentially fail the pipeline

I'm adding a section "Validating JSON results at runtime" to cover this in more detail and moving this list out of the notes/caveats section

pierretasci · 2021-07-20T21:31:37Z

teps/0076-array-result-types.md

+  tasks:
+    - name: deploy
+      params:
+        - name: environment
+          value: '$(params.environments[0])'


this is where I think dynamism could be super cool. What if we did:

Suggested change

tasks:

- name: deploy

params:

- name: environment

value: '$(params.environments[0])'

tasks:

- name: deploy

params:

- name: environment

value: '$(params.environments[])'

Having the non-indexed array there, similar to jq would produce a copy of task deploy for each valid in the input params.environments. Seems fairly straight-forward and a reasonable addition to the api.

I disagree there, or at least in the scope of the TEP. This might be something that appear confusing to the users, and most importantly, I would like to make sure we really need this (and custom task are not enough) before commiting to that level of complexity.

I can think of many use cases related to test sharding where this would be super useful. We also have many "build these X services and deploy them to an ephemeral cluster" uses cases. All involve a ton of copy+pasta of the same exact spec with different params

Right, but as of today, this is what task-loop custom task is there for, isn't it ?

@pierretasci 100% agree with the test sharding use case and i have a languishing todo list item about trying to write up a TEP about better supporting test sharding specifically (which ultimately would lead to needing looping support)

I also agree with @vdemeester though that I don't personally see that as being exaclty within the scope of this particular TEP to add looping support BUT I do see array result support as a requirement for being able to support looping (and sharding) - and maybe that's what you're getting at @pierretasci (not that this TEP includes implementing looping, but that we need this TEP for looping and sharding is a great use case to list to justify adding array results)

Well I was specifically advocating for supporting it directly in the Tekton API because I think that unlocks many more use-cases but I can also see the argument against.

i am 💯 in favor of looping in the pipelines api at this point but i think we need to explore the use cases and alternatives a bit (and hopefully @vdemeester will find them convincing :D )

#600 🎉 🎉 🎉

pierretasci

Overall, I think this is a more than worthwhile change and use case. My only comment is that I think we should take it one step further but otherwise, this is LGTM from me.

/LGTM

vdemeester · 2021-07-22T14:21:52Z

/hold

ghost

I think array params require the an author to explicitly declare the type:

spec:
  params:
  - name: foo
    type: array

If type isn't specified then the default is assumed to be string. A taskrun cannot supply an array of results when task doesn't specify param type. Validation fails with something like invalid input params for task mytask: param types don''t match the user-specified type: [foo].

Assuming that we either keep type and add type: object or deprecate type in favor of schema I think we can avoid introducing a new result path - echoing what @pierretasci already suggested - the result's type/schema in the task definition would explicitly tell the entrypoint/controller how the file content should have been written, should be parsed, and whether it's valid or not. I think this would be totally backwards compatible since type would default to string.

wdyt? I might be missing something here though.

Edited to include mention of the possible type deprecation.

teps/0076-array-result-types.md

ghost · 2021-07-23T17:33:55Z

Ah ok, sorry, just read the dictionary proposal and I see now that we're discussing type being deprecated in favor of schema.

ghost · 2021-07-23T18:19:16Z

My questions are entirely focused around the implementation and since we're only at the proposal stage I'd def like to see this move ahead!

/approve

vdemeester · 2021-07-23T18:32:44Z

/assign

afrittoli

Thank you! Just added a few comments, mostly I'm not 100% sure about indexing, but in general this looks good!

teps/0076-array-result-types.md

afrittoli · 2021-08-02T14:07:01Z

teps/0076-array-result-types.md

+
+### Use Cases
+
+1. Looping (i.e. providing values for an interface such as the task-loop custom task


Since looping is not part of the core API today, I wonder if we could add another use case here.
Even if an array is not used as an input to a loop, it might be used as args for another task.
I could have a task that builds the args for the next task, by producing an array of parameters.

+:100: on what @afrittoli just wrote. Given the current feature set of pipeline, I think that should appear as the main use case 👼🏼

afrittoli · 2021-08-02T14:13:32Z

teps/0076-array-result-types.md

+
+* Add two missing pieces of array support
+  * As results
+  * Indexing into arrays


I think there's no use case defined to support the indexing goal.
If we add indexing, we might need to introduce an operator that allows checking the size of an array too?

Good point 😅

This could be helpful in loops when we want to loop over certain indices of the array. But if the loop reconciler can take the results as strings and extract them into arrays, then we can workaround without array indexing.

another +1 to including the size as a variable replacement, that could be a way to solve tektoncd/pipeline#4097

(will need to check if json schema provides something like this or if we would need to pave new ground)

@bobcatfish This seems like a good use case to have in the TEP! 👍

circling back to this, I forgot that @sbwsg and I had already discussed this a bit and I'm back to leaving this out of scope for now (discussion: #477 (comment) TL;DR not clear how to support this for other types e.g. objects, or where we'd draw the line - also we can always add this later!) - it's currently listed in the alternatives section of the doc

teps/0076-array-result-types.md

afrittoli · 2021-08-02T14:55:15Z

teps/0076-array-result-types.md

+   * Con: Without indexing, arrays will only work with arrays. This means you cannot combine arrays with Tasks that
+     only take string parameters and this limits the interoperabiltiy between Tasks.


It makes sense, I kind of struggle to imaging an actual example though.
I would need an array parameter where certain positions in the array hold special meaning I guess?

jerop · 2021-08-02T18:53:15Z

/assign

bobcatfish · 2022-02-02T23:41:55Z

Ahh I see, thanks for explaining in so much detail @mogsie . How strongly do you feel about the json text sequence syntax being supported right from the start? If I'm understanding correctly it seems like a) we'll likely want support for plain old json regardless of whether we support text sequences or not (we need to parse the content of each line anyway which is itself a complete json type) and b) this can be additively supported in a backwards compatible way.

My preference is to keep this as simple as we can in the initial iterations (which is why I don't even want to support type nesting initially!) and expand the functionality later, including potentially adding support for json text sequences. What do you think?

I've added JSON text sequences as an alternative in the proposal in the meantime - and I've tried to capture the properties you listed above

empty values represent exactly that.

I don't quite follow that one - is this an easier way to specify 'null'?

pierretasci · 2022-02-02T23:51:57Z

teps/0076-array-result-types.md

+* Kubernetes supports `yaml` and `json`, so it makes sense to narrow our choices to at least these, as opposed to
+  introducing a new format into the mix.
+
+We may want to consider also adding support for [JSON text sequences](#also-support-json-text-sequences) in future.


Other important note, YAML is a superset of json, all valid json is valid yaml so in a sense, we get the best of both 😄

Good point! I'll add that as a bullet as well :D

pierretasci · 2022-02-02T23:52:36Z

/approve

mogsie · 2022-02-03T17:52:21Z

I understand that supporting object syntax is over-the-top for this "only arrays" TEP. I initially suggested just that, a pure "array of strings" syntax, but the point about "future proofness" of this didn't quite come across. My main reasoning was that if you serialize an array of strings using JSON Text Sequence, then you gain a lot of future compatibility with TEP-0075, you get to be able to join two arrays by concatinating them, and so on.

So instead of a result array being serialized here depicted as a "value" in yaml as a string representation of a JSON array

value: |
  [ "one", "two", "three" ]

I suggest using this (json text sequence compatible):

value: |
  "one"
  "two"
  "three"

How strongly do you feel about the json text sequence syntax being supported right from the start?

I'm just concerned that Tekton might miss out on opportunities if it selects an array syntax that isn't easily composable / needs yet another syntax for object notation. e.g. if you have a set of tasks, and each one emits a result and then you want to join these results together to form an array; if results are emitted as "single element json text sequences", they can just be joined together:

task 1 result = "foo"\n
task 2 result = "bar"\n
task 3 result = "baz"\n

input to some array = task1 result + task2 result + task 3 result =
"foo"
"bar"
"baz"

-- an array with three elements in it

empty values represent exactly that.

I think I was talking about how when value is empty, (i.e. an empty array) it is just an empty string.

I don't quite follow that one - is this an easier way to specify 'null'?

Null is just the word "null" without quotes, same as true, false: Here the second value is a null element

value: |
  "one"
  null
  "three"

mogsie · 2022-02-03T18:08:09Z

teps/0076-array-result-types.md

+[TEP-0075](https://github.com/tektoncd/community/pull/479), dictionaries (technically "objects"). Possible other formats
+we could use are:
+
+* `yaml` (i dont think there is a syntax like jsonpath available for yaml? i.e. it'd be harder to expose via variable


yaml ends up being in-memory, a memory structure that's very similar to json, so you should be able to use jsonpath / jsonpointer etc on a yaml document.

mogsie · 2022-02-03T18:14:45Z

teps/0076-array-result-types.md

+    steps: ...
+    results:
+    - name: animals
+      type: array # this field is new for results


I know the whole proposal is about making array results, but this "type: array" is in conflict with "type: object" — have you considered a different attribute like "multiple: true" or something?

With both 0075 and 0076, you'd be able to define something like:

name: animals type: object properties: name: type: string weight: type: number multiple: true

and emit (using json text sequence, of course)

cat <<EOF > $(results.animals.path) { "name": "cat", weight: 33} { "name": "dog", weight: 44} EOF

Interesting - i was imagining the syntax being more json schema aligned to support the use case you're describing (an array of objects), e.g. something like this (based on https://json-schema.org/understanding-json-schema/reference/array.html and https://json-schema.org/learn/miscellaneous-examples.html#arrays-of-things):

name: animals type: array items: { type: object properties: name: type: string weight: type: number }

And if we supported json text sequences, i would think the schema would look the same, the difference would be in how the results written by the Task were formatted? (i.e. im not seeing these features as being in conflict but let me know if I'm missing something)

However I have intentionally tried to keep nested types out of the scope of this proposal and TEP-0075 as well. I would want to make sure nothing we propose here would stop us from adding nested type support in the future so if you see this being incompatible please continue to raise it.

Personally I would still lean toward using the json schema syntax for this vs. introducing a new multiple field.

This TEP proposes adding more support for array params by adding array results as well as the ability to index into arrays. It refers to TEP-0075 which will be added in a separate commit, which proposes adding support for dictionary types. Related issues: * [pipelines#1393 Consider removing type from params (or _really_ support types)](tektoncd/pipeline#1393) * [pipelines#3255 Arguments of type array cannot be access via index](tektoncd/pipeline#3255)

bobcatfish · 2022-02-04T19:15:03Z

I initially suggested just that, a pure "array of strings" syntax, but the point about "future proofness" of this didn't quite come across. My main reasoning was that if you serialize an array of strings using JSON Text Sequence, then you gain a lot of future compatibility with TEP-0075, you get to be able to join two arrays by concatinating them, and so on.

Thanks for clarifying @mogsie - I've tried to add more detail to the section I added on JSON text sequences to try to capture how this syntax could make composability easier for Tasks in a Pipeline, let me know if I'm still not capturing it (and feel free if you want to write something verbatim to add into the proposal, I'd be happy to add you as an author if you want).

My preference still remains to keep this out of scope from the initial proposal - provided none of the choices we make here preclude adding support in the future.

jerop · 2022-02-04T20:02:08Z

teps/0076-array-result-types.md

+   strings.
+
+   If we add support for array results before we add support for looping (and until we have an approved proposal, there is
+   no guarantee we ever will!) there will be no way to use an array result from one Task with a Task that expects a


getting there soon 😉

https://github.com/tektoncd/community/blob/main/teps/0090-matrix.md

tekton-robot · 2022-02-04T20:02:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: afrittoli, jerop, pierretasci, sbwsg, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [afrittoli,jerop,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dibyom · 2022-02-07T17:15:34Z

/cc @pritidesai to review

tekton-robot · 2022-02-07T17:15:36Z

@dibyom: GitHub didn't allow me to request PR reviews from the following users: to, review.

Note that only tektoncd members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @pritidesai to review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pritidesai · 2022-02-11T19:17:26Z

thank you @bobcatfish for addressing my comments!
/lgtm

@mogsie can this be merged given that your proposal is added as a possible extension to the existing proposal?

bobcatfish · 2022-02-11T23:37:52Z

I'm just concerned that Tekton might miss out on opportunities if it selects an array syntax that isn't easily composable / needs yet another syntax for object notation. e.g. if you have a set of tasks, and each one emits a result and then you want to join these results together to form an array; if results are emitted as "single element json text sequences", they can just be joined together:

@mogsie another thought on this, which may (or may not) help is that in a pipeline, users would be able to concatenate array results together without needing to be aware of anything json specific at all

our syntax for using array params currently looks like this:

spec:
  params:
    - name: flags
      type: array
      description: List of flags
  tasks:
    - name: build-skaffold-web
      params:
        - name: flags
          value: ["$(params.flags[*])"]

The array has to be explicitly expanded, into a field of type array: value: ["$(params.flags[*])"]

Once Tasks can emit array results, the same syntax would be used, which means that combining the array results from two Tasks would look like this:

  tasks:
    - name: build-skaffold-web
      params:
        - name: flags
          value: ["$(tasks.prev-task.results.some-flags[*])", "$(tasks.another-task.results.other-flags[*]"]

As far as I can tell, the question of supporting json text sequences vs just json would only come into play within the steps of a Task (not between Tasks in a Pipeline) - and again this is something we could expand to support both formats later on.

pritidesai · 2022-02-14T17:15:36Z

/cancel hold

pritidesai · 2022-02-14T17:15:45Z

/hold cancel

pritidesai · 2022-02-14T17:16:13Z

API WG - ready to merge, cancelling the hold

xchapter7x · 2022-03-04T16:32:14Z

/area s3c

While working on (finally) updating the array result and object param/result TEPs (tektoncd/community#479 tektoncd/community#477) I realized I hadn't included an example of how to specify defaults for the new format, so I looked for an example of how we currently do this for arrays, but we had none! Hopefully now we do :D

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Jul 15, 2021

tekton-robot requested review from hrishin and sthaha July 15, 2021 22:43

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 15, 2021

bobcatfish commented Jul 16, 2021

View reviewed changes

teps/0076-array-result-types.md Show resolved Hide resolved

tekton-robot assigned pierretasci and ghost Jul 19, 2021

bobcatfish mentioned this pull request Jul 19, 2021

Variable to get a list of results of a task tektoncd/pipeline#4097

Open

pierretasci reviewed Jul 20, 2021

View reviewed changes

teps/0076-array-result-types.md Show resolved Hide resolved

pierretasci reviewed Jul 20, 2021

View reviewed changes

teps/0076-array-result-types.md Outdated Show resolved Hide resolved

pierretasci reviewed Jul 20, 2021

View reviewed changes

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 20, 2021

bobcatfish mentioned this pull request Jul 20, 2021

Changing the way Result Parameters are stored. tektoncd/pipeline#4012

Open

tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2021

ghost reviewed Jul 23, 2021

View reviewed changes

teps/0076-array-result-types.md Show resolved Hide resolved

ghost mentioned this pull request Jul 23, 2021

[TEP-0075] Propose object (dictionary) param and result types 🤓 #479

Merged

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2021

tekton-robot assigned vdemeester Jul 23, 2021

jerop mentioned this pull request Jul 27, 2021

TEP-0048: task results without results - problem statement #240

Merged

afrittoli reviewed Aug 2, 2021

View reviewed changes

tekton-robot assigned jerop Aug 2, 2021

pierretasci reviewed Feb 2, 2022

View reviewed changes

mogsie reviewed Feb 3, 2022

View reviewed changes

bobcatfish force-pushed the tep_array branch from 4a249da to 43e6b1e Compare February 4, 2022 19:15

jerop reviewed Feb 4, 2022

View reviewed changes

jerop approved these changes Feb 4, 2022

View reviewed changes

tekton-robot requested a review from pritidesai February 7, 2022 17:15

jerop mentioned this pull request Feb 7, 2022

TEP-0090: Matrix [Proposal] #600

Merged

tekton-robot assigned pritidesai Feb 11, 2022

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 11, 2022

tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2022

tekton-robot merged commit e0e7d73 into tektoncd:main Feb 14, 2022

tekton-robot added the area/s3c Issues or PRs that are related to Secure Software Supply Chain (S3C) label Mar 4, 2022

bobcatfish mentioned this pull request Mar 18, 2022

[TEP-0075, 0076] Make array and object support implementable 👷‍♀️ #661

Merged


		### Use Cases

		1. Looping (i.e. providing values for an interface such as the task-loop custom task

		* Con: Without indexing, arrays will only work with arrays. This means you cannot combine arrays with Tasks that
		only take string parameters and this limits the interoperabiltiy between Tasks.

[TEP-0076] Propose array results and indexing 🐠 #477

[TEP-0076] Propose array results and indexing 🐠 #477

Conversation

bobcatfish commented Jul 15, 2021 • edited

bobcatfish commented Jul 16, 2021

chhsia0 commented Jul 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierretasci left a comment

Choose a reason for hiding this comment

vdemeester commented Jul 22, 2021

ghost left a comment • edited by ghost

Choose a reason for hiding this comment

ghost commented Jul 23, 2021

ghost commented Jul 23, 2021

vdemeester commented Jul 23, 2021

afrittoli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerop commented Aug 2, 2021

bobcatfish commented Feb 2, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pierretasci commented Feb 2, 2022

mogsie commented Feb 3, 2022

Choose a reason for hiding this comment

mogsie Feb 3, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobcatfish commented Feb 4, 2022

Choose a reason for hiding this comment

tekton-robot commented Feb 4, 2022

dibyom commented Feb 7, 2022

tekton-robot commented Feb 7, 2022

pritidesai commented Feb 11, 2022

bobcatfish commented Feb 11, 2022

pritidesai commented Feb 14, 2022

pritidesai commented Feb 14, 2022

pritidesai commented Feb 14, 2022

xchapter7x commented Mar 4, 2022

bobcatfish commented Jul 15, 2021 •

edited

ghost left a comment •

edited by ghost

mogsie Feb 3, 2022 •

edited