New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add design docs for task-leve advanced scheduling policy #1630
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And l think we should also add corresponding check in the webhook
docs/design/task-DAG-scheduling.md
Outdated
} | ||
``` | ||
|
||
2.3 `probe` is used to detect the status of the task in dependOn. This field can be used with the `probe` struct in kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t quite understand what the field probe
is used for, can you elaborate more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think probe applies to both cases in lines 29 and 30 of the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mean the probe configured here should be applied to each containers in pod template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for both HTTP and TCP, there is no need to consider the issue of multiple containers. But exec
method should be selected containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of two ways:
- Add a field to specify the container name
- dependOn.taskname in the format of
taskName/containerName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we should only care about taskName in task spec, so this field is not so necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If don't add fields, then how to specify the container? Do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If don't add fields, then how to specify the container? Do you have any suggestions?
To simplify the implementation, I think we can only consider one container per pod temporarily for most users take it into practice.
docs/design/task-DAG-scheduling.md
Outdated
```go | ||
type TriggerSpec struct { | ||
Status string | ||
Time time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Time
means the time period after reaching the state or the period after the task starting to run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time
is the time period after reaching the state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such as the task has been running for 10s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ls there any usage scenarios? Generally, l think Status
is enough, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought of any scenarios yet, but I think it will be more flexible with the addition of the time field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of realization, if we want to support this, we also need to record the transition time of task (e.g maybe the time when all replicas under one task converting to running), which is not available right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, But I think maybe we can just use task to meet the minavailable time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, But I think maybe we can just use task to meet the minavailable time
That's an available solution.
docs/design/task-DAG-scheduling.md
Outdated
|
||
```go | ||
type TriggerSpec struct { | ||
Status string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
l think we should describe how we define task status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions. I will add some documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated documents and modified some fields, let us discuss on this basis.
docs/design/task-DAG-scheduling.md
Outdated
} | ||
``` | ||
|
||
2.3 `probe` is used to detect the status of the task in dependOn. This field can be used with the `probe` struct in kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mean the probe configured here should be applied to each containers in pod template?
docs/design/task-DAG-scheduling.md
Outdated
```go | ||
type TriggerSpec struct { | ||
Status string | ||
Time time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ls there any usage scenarios? Generally, l think Status
is enough, wdyt?
/hold |
docs/design/task-DAG-scheduling.md
Outdated
|
||
* Suppose there are two tasks and task 2 needs to use the calculation results of task 1,in this case, task1 needs to be in the complete state, and then create the pod for task2. | ||
|
||
* some ETL applications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give more details about ETL applcations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Extract,_transform,_load
https://baike.baidu.com/item/ETL/1251949?fr=aladdin
Here are some links about ETL. Simply put, ETL represents the three steps of processing data, and there is a dependency between these three steps, so we need a specific order to start the task.
docs/design/task-DAG-scheduling.md
Outdated
|
||
Examples of time points: | ||
|
||
* running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IC, Are there any other time points? Suggest to list all possbible time points in order to take them into consideration when doing feature design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of another time pointes at the moment.
docs/design/task-DAG-scheduling.md
Outdated
|
||
```go | ||
type TriggerSpec struct { | ||
Event string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Event should also be a struct. It includes not only the task status but also the conditions list above.
docs/design/task-DAG-scheduling.md
Outdated
```go | ||
type TriggerSpec struct { | ||
Status string | ||
Time time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you are right, But I think maybe we can just use task to meet the minavailable time
That's an available solution.
docs/design/task-DAG-scheduling.md
Outdated
} | ||
``` | ||
|
||
2.3 `probe` is used to detect the status of the task in dependOn. This field can be used with the `probe` struct in kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If don't add fields, then how to specify the container? Do you have any suggestions?
To simplify the implementation, I think we can only consider one container per pod temporarily for most users take it into practice.
docs/design/task-DAG-scheduling.md
Outdated
- taskName: taskE | ||
trigger: | ||
event: complete | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of another scenario. Not sure if it exists. If task A depends on task B and C,it may need to wait for both B and C compeletes or either B or C completes. How do you think about it? @shinytang6 @hwdef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these situations really need to be considered, and we should use a method to distinguish them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these situations really need to be considered, and we should use a method to distinguish them.
So we need a field at the same level with dependon
to indicates the way to watch the status of previous tasks
@shinytang6 @Thor-wl |
Got it. I'll review it again. |
|
||
## Introduction | ||
|
||
This feature provides the ability to customize the order in which tasks are launched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the goal mentioned above. Namely, this feature aims to provide the ability to make task start in order instead of provide a Dag. It mathes more with the scenarios, which we discussed at last weekly meeting.
@@ -0,0 +1,98 @@ | |||
# Task-leve advanced scheduling policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Task Launch Order Within Job is more accurate ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions. I changed the title.
|
||
## Design | ||
|
||
1. Add a field that is an array named `dependsOn` in task spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to achieve the following goals:
- keep job api brief and focus on its basic functions
- separate task scheduling logic from job
- make scheduling logic abstract and can be reused
I make the following suggestions:
- create a new CRD
TaskFlow
, it indicates the start order of tasks within a job - divide the scenarios into two categories, namely, Dag and StartOrder. Dag means the following task starts depending on the status of the previous task. StartOrder means the following task starts depending on some condition custom self-defined such as health check/liveness check/target output. New api should take them into consideration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with most. But for the definition of DAG, In previous community discussions, most people thought that DAG means that the previous task reaches the complete state before the next task can be executed? So we should not use the concept of DAG, because our definition of state includes Complete
,Running
and other states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with most. But for the definition of DAG, In previous community discussions, most people thought that DAG means that the previous task reaches the complete state before the next task can be executed? So we should not use the concept of DAG, because our definition of state includes
Complete
,Running
and other states.
OK. the definition can be stressed.
CI has been fixed and please pull and rebase from master branch to make CI happy. |
/approve |
/cc @kevin-wangzefeng @k82cn Please take a reivew |
@Thor-wl: GitHub didn't allow me to request PR reviews from the following users: take, a, reivew, Please. Note that only volcano-sh members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
||
## Introduction | ||
|
||
This feature provides the ability to customize the order in which tasks are launched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the "Scope" and "Out of scope" section.
From the content below, the scope of this design is target to resolve both the task dag in podgroup and task launch order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the scope, please review it again.
9c16419
to
9c2cc88
Compare
@william-wang @Thor-wl @kevin-wangzefeng I update the design doc, please review it again. |
|
||
### Field | ||
|
||
Add a field in `vcjob.spec.task.DependentTask`, it represents the name of the task that this task needs to depend on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add an object instead of a field which is more extendable :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@k82cn
After yesterday’s discussion, I think that several other reviewers do not want the possibility of expansion of this feature.Currently only the running state is retained. If a field has only one value, then the field should not exist.
dependency:
name: taskA
status: running
For example, in the above example, there is no reason for the status field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After yesterday’s discussion, I think that several other reviewers do not want the possibility of expansion of this feature.
Extensibility is important! It's ok to support only one task right now, but it's better to be extendable, something like that:
depends:
name: taskA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we should support multi tasks.
- In yesterday's meeting, I felt that other reviewers did not pay attention to scalability. Please unify internally
depends
just have one field namedname
, this kind of design is an over-design.Yesterday, we made it clear that we will only solve existing problems and not consider future scenarios.
Task level dependency scheduling is giant feature for Volcano, I agree that we should give more extra thoughts about it. From my point of view, I think we may first answer one basic question: Do we need to add task level dependency in VCJob? If no, then the current behavior is OK, pods of one VCJob are created out of order. If yes, we should design it. And the current design of adding only one |
I agree with @ZhangZhenhua most. Obviously, task dependency is valuable in applications. And we should design it. We should leave space for furture expansion such as custom detection means. |
I talked to some customers recently, some of them are using Volcano right now, and almost all of the Volcano users are using Argo(or something else similar) together with Volcano. I think this is a clear evidence that volcano can't handle complex Jobs/tasks alone. If this is the right way to go, I think it's OK. If we just want to add some basic dependency into VCJob, like starting tasks sequentially, we can just add one annotation to handle it and leave the current job.spec unchanged. |
And I do not understand. Why reviewers are very cautious about adding a crd, but very open to introducing new projects. The introduction of argo and kubegenen projects will introduce many crd, at least more than one, but this is the recommended approach. I don’t know if there is a unified standard for crd. |
|
||
| example | number of tasks | task dependencies | concurrent execution | Solution Status | Disadvantages of the current solution | | ||
| ---------- | -------- | ------------ | -------- | -------------------------------------------------- | -------------------------- | | ||
| MPI | 2 | linear dependencies | yes | 1. Add initcontainer to the task<br/>2.Using user-written check scripts in initcontainer | 1.Increase the cost of use for users<br />2.Still using resources while waiting | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be better that VCJob provides a convenient image for initContainer and maintains the related codes like MPI-Operator does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many limitations to using initcontainer, For example, for windows cluster support, and support for multiple detection methods will not be very convenient. I think add initcontainer for pod is not a good way.
Signed-off-by: hwdef <hwdefcom@outlook.com>
I update the API, please review this again, thanks! |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Thor-wl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: hwdef hwdefcom@outlook.com
Ref: #1627