Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add design docs for task-leve advanced scheduling policy #1630

Merged
merged 1 commit into from Oct 25, 2021

Conversation

hwdef
Copy link
Member

@hwdef hwdef commented Jul 19, 2021

Signed-off-by: hwdef hwdefcom@outlook.com

Ref: #1627

@volcano-sh-bot volcano-sh-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jul 19, 2021
@hwdef hwdef marked this pull request as draft July 19, 2021 10:33
Copy link
Member

@shinytang6 shinytang6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And l think we should also add corresponding check in the webhook

}
```

2.3 `probe` is used to detect the status of the task in dependOn. This field can be used with the `probe` struct in kubernetes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t quite understand what the field probe is used for, can you elaborate more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probe applies to both cases in lines 29 and 30 of the documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you mean the probe configured here should be applied to each containers in pod template?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for both HTTP and TCP, there is no need to consider the issue of multiple containers. But exec method should be selected containers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of two ways:

  • Add a field to specify the container name
  • dependOn.taskname in the format of taskName/containerName

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should only care about taskName in task spec, so this field is not so necessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If don't add fields, then how to specify the container? Do you have any suggestions?

Copy link
Member

@Thor-wl Thor-wl Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If don't add fields, then how to specify the container? Do you have any suggestions?

To simplify the implementation, I think we can only consider one container per pod temporarily for most users take it into practice.

```go
type TriggerSpec struct {
Status string
Time time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Time means the time period after reaching the state or the period after the task starting to run

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time is the time period after reaching the state

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such as the task has been running for 10s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ls there any usage scenarios? Generally, l think Status is enough, wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought of any scenarios yet, but I think it will be more flexible with the addition of the time field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of realization, if we want to support this, we also need to record the transition time of task (e.g maybe the time when all replicas under one task converting to running), which is not available right now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, But I think maybe we can just use task to meet the minavailable time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, But I think maybe we can just use task to meet the minavailable time

That's an available solution.


```go
type TriggerSpec struct {
Status string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l think we should describe how we define task status

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions. I will add some documents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated documents and modified some fields, let us discuss on this basis.

}
```

2.3 `probe` is used to detect the status of the task in dependOn. This field can be used with the `probe` struct in kubernetes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you mean the probe configured here should be applied to each containers in pod template?

```go
type TriggerSpec struct {
Status string
Time time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ls there any usage scenarios? Generally, l think Status is enough, wdyt?

@Thor-wl Thor-wl requested review from Thor-wl, jasonliu747, shinytang6, william-wang and wpeng102 and removed request for k82cn and yuanchen8911 July 19, 2021 11:11
@hwdef hwdef marked this pull request as ready for review July 19, 2021 14:16
@william-wang
Copy link
Member

/hold

@volcano-sh-bot volcano-sh-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 21, 2021

* Suppose there are two tasks and task 2 needs to use the calculation results of task 1,in this case, task1 needs to be in the complete state, and then create the pod for task2.

* some ETL applications
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give more details about ETL applcations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Extract,_transform,_load

https://baike.baidu.com/item/ETL/1251949?fr=aladdin

Here are some links about ETL. Simply put, ETL represents the three steps of processing data, and there is a dependency between these three steps, so we need a specific order to start the task.


Examples of time points:

* running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IC, Are there any other time points? Suggest to list all possbible time points in order to take them into consideration when doing feature design.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of another time pointes at the moment.


```go
type TriggerSpec struct {
Event string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Event should also be a struct. It includes not only the task status but also the conditions list above.

```go
type TriggerSpec struct {
Status string
Time time.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right, But I think maybe we can just use task to meet the minavailable time

That's an available solution.

}
```

2.3 `probe` is used to detect the status of the task in dependOn. This field can be used with the `probe` struct in kubernetes
Copy link
Member

@Thor-wl Thor-wl Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If don't add fields, then how to specify the container? Do you have any suggestions?

To simplify the implementation, I think we can only consider one container per pod temporarily for most users take it into practice.

- taskName: taskE
trigger:
event: complete
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of another scenario. Not sure if it exists. If task A depends on task B and C,it may need to wait for both B and C compeletes or either B or C completes. How do you think about it? @shinytang6 @hwdef

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these situations really need to be considered, and we should use a method to distinguish them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these situations really need to be considered, and we should use a method to distinguish them.

So we need a field at the same level with dependon to indicates the way to watch the status of previous tasks

@hwdef
Copy link
Member Author

hwdef commented Jul 29, 2021

@shinytang6 @Thor-wl
I have updated this docs. Please review it again.

@Thor-wl
Copy link
Member

Thor-wl commented Jul 30, 2021

@shinytang6 @Thor-wl
I have updated this docs. Please review it again.

Got it. I'll review it again.

@hwdef hwdef changed the title add design docs for task-DAG-scheduling add design docs for task-leve advanced schedulingpolicy Jul 30, 2021
@hwdef hwdef changed the title add design docs for task-leve advanced schedulingpolicy add design docs for task-leve advanced scheduling policy Jul 30, 2021

## Introduction

This feature provides the ability to customize the order in which tasks are launched.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the goal mentioned above. Namely, this feature aims to provide the ability to make task start in order instead of provide a Dag. It mathes more with the scenarios, which we discussed at last weekly meeting.

@@ -0,0 +1,98 @@
# Task-leve advanced scheduling policy
Copy link
Member

@Thor-wl Thor-wl Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Task Launch Order Within Job is more accurate ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestions. I changed the title.


## Design

1. Add a field that is an array named `dependsOn` in task spec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to achieve the following goals:

  • keep job api brief and focus on its basic functions
  • separate task scheduling logic from job
  • make scheduling logic abstract and can be reused

I make the following suggestions:

  • create a new CRD TaskFlow, it indicates the start order of tasks within a job
  • divide the scenarios into two categories, namely, Dag and StartOrder. Dag means the following task starts depending on the status of the previous task. StartOrder means the following task starts depending on some condition custom self-defined such as health check/liveness check/target output. New api should take them into consideration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with most. But for the definition of DAG, In previous community discussions, most people thought that DAG means that the previous task reaches the complete state before the next task can be executed? So we should not use the concept of DAG, because our definition of state includes Complete,Runningand other states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with most. But for the definition of DAG, In previous community discussions, most people thought that DAG means that the previous task reaches the complete state before the next task can be executed? So we should not use the concept of DAG, because our definition of state includes Complete,Runningand other states.

OK. the definition can be stressed.

@Thor-wl
Copy link
Member

Thor-wl commented Jul 31, 2021

CI has been fixed and please pull and rebase from master branch to make CI happy.
You can execute git pull --rebase origin/master to do that. @hwdef

@Thor-wl
Copy link
Member

Thor-wl commented Aug 9, 2021

/approve
/lgtm

@Thor-wl
Copy link
Member

Thor-wl commented Aug 24, 2021

/cc @kevin-wangzefeng @k82cn Please take a reivew

@volcano-sh-bot
Copy link
Contributor

@Thor-wl: GitHub didn't allow me to request PR reviews from the following users: take, a, reivew, Please.

Note that only volcano-sh members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @kevin-wangzefeng @k82cn Please take a reivew

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


## Introduction

This feature provides the ability to customize the order in which tasks are launched.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the "Scope" and "Out of scope" section.
From the content below, the scope of this design is target to resolve both the task dag in podgroup and task launch order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the scope, please review it again.

@hwdef hwdef force-pushed the task-dag-doc branch 3 times, most recently from 9c16419 to 9c2cc88 Compare September 2, 2021 03:24
@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 2, 2021
@hwdef
Copy link
Member Author

hwdef commented Sep 2, 2021

@william-wang @Thor-wl @kevin-wangzefeng

I update the design doc, please review it again.


### Field

Add a field in `vcjob.spec.task.DependentTask`, it represents the name of the task that this task needs to depend on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add an object instead of a field which is more extendable :)

Copy link
Member Author

@hwdef hwdef Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k82cn
After yesterday’s discussion, I think that several other reviewers do not want the possibility of expansion of this feature.Currently only the running state is retained. If a field has only one value, then the field should not exist.

dependency:
   name: taskA
   status: running

For example, in the above example, there is no reason for the status field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After yesterday’s discussion, I think that several other reviewers do not want the possibility of expansion of this feature.

Extensibility is important! It's ok to support only one task right now, but it's better to be extendable, something like that:

depends:
   name: taskA

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we should support multi tasks.
  2. In yesterday's meeting, I felt that other reviewers did not pay attention to scalability. Please unify internally
  3. depends just have one field named name, this kind of design is an over-design.Yesterday, we made it clear that we will only solve existing problems and not consider future scenarios.

@ZhangZhenhua
Copy link

ZhangZhenhua commented Sep 2, 2021

Task level dependency scheduling is giant feature for Volcano, I agree that we should give more extra thoughts about it.

From my point of view, I think we may first answer one basic question: Do we need to add task level dependency in VCJob?

If no, then the current behavior is OK, pods of one VCJob are created out of order.

If yes, we should design it. And the current design of adding only one dependsOn field is not enough.

@Thor-wl
Copy link
Member

Thor-wl commented Sep 3, 2021

Task level dependency scheduling is giant feature for Volcano, I agree that we should give more extra thoughts about it.

From my point of view, I think we may first answer one basic question: Do we need to add task level dependency in VCJob?

If no, then the current behavior is OK, pods of one VCJob are created out of order.

If yes, we should design it. And the current design of adding only one dependsOn field is not enough.

I agree with @ZhangZhenhua most. Obviously, task dependency is valuable in applications. And we should design it. We should leave space for furture expansion such as custom detection means.

@ZhangZhenhua
Copy link

Task level dependency scheduling is giant feature for Volcano, I agree that we should give more extra thoughts about it.
From my point of view, I think we may first answer one basic question: Do we need to add task level dependency in VCJob?
If no, then the current behavior is OK, pods of one VCJob are created out of order.
If yes, we should design it. And the current design of adding only one dependsOn field is not enough.

I agree with @ZhangZhenhua most. Obviously, task dependency is valuable in applications. And we should design it. We should leave space for furture expansion such as custom detection means.

I talked to some customers recently, some of them are using Volcano right now, and almost all of the Volcano users are using Argo(or something else similar) together with Volcano. I think this is a clear evidence that volcano can't handle complex Jobs/tasks alone.

If this is the right way to go, I think it's OK.

If we just want to add some basic dependency into VCJob, like starting tasks sequentially, we can just add one annotation to handle it and leave the current job.spec unchanged.

@hwdef
Copy link
Member Author

hwdef commented Sep 3, 2021

And I do not understand. Why reviewers are very cautious about adding a crd, but very open to introducing new projects. The introduction of argo and kubegenen projects will introduce many crd, at least more than one, but this is the recommended approach. I don’t know if there is a unified standard for crd.

@william-wang @kevin-wangzefeng


| example | number of tasks | task dependencies | concurrent execution | Solution Status | Disadvantages of the current solution |
| ---------- | -------- | ------------ | -------- | -------------------------------------------------- | -------------------------- |
| MPI | 2 | linear dependencies | yes | 1. Add initcontainer to the task<br/>2.Using user-written check scripts in initcontainer | 1.Increase the cost of use for users<br />2.Still using resources while waiting |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be better that VCJob provides a convenient image for initContainer and maintains the related codes like MPI-Operator does.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many limitations to using initcontainer, For example, for windows cluster support, and support for multiple detection methods will not be very convenient. I think add initcontainer for pod is not a good way.

Signed-off-by: hwdef <hwdefcom@outlook.com>
@hwdef
Copy link
Member Author

hwdef commented Sep 24, 2021

I update the API, please review this again, thanks!
@william-wang @Thor-wl @k82cn @shinytang6

@Thor-wl
Copy link
Member

Thor-wl commented Oct 25, 2021

/lgtm
/approve

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 25, 2021
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Thor-wl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Thor-wl Thor-wl removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2021
@volcano-sh-bot volcano-sh-bot merged commit 72363cf into volcano-sh:master Oct 25, 2021
@hwdef hwdef mentioned this pull request Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. retest-not-required-docs-only size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants