Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Job API to support specific types of job #1034

Closed
hzxuzhonghu opened this issue Sep 7, 2020 · 9 comments
Closed

Improve Job API to support specific types of job #1034

hzxuzhonghu opened this issue Sep 7, 2020 · 9 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@hzxuzhonghu
Copy link
Collaborator

Currently, when users need to do TensorFlow training, firstly they need to define the TF job, with plugin svc env set.

But if they want to run MPI job, they would need to set svc, ssh plugins. It is not friendly, so we want to improve, this will require API change to support specific types of jobs without requiring users to be aware of the inner plugins.

@hzxuzhonghu
Copy link
Collaborator Author

/kind feature

@volcano-sh-bot volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 7, 2020
@shinytang6
Copy link
Member

There are some questions:

  1. To completely remove the configuration of job plugins, it will inevitably bring incompatibility and even affect some user behaviors. For example, users may have already implemented their own plugins based on the current plugin framework. Are we sure we want to do this? Even if we want to do so, I think we may need to release a large version because of the incompatibility?
  2. If we are certain that we want to do this, I think what we should do is setting up the mapping of different types of jobs and corresponding job plugins. For example, tensorflow job -> [svc, env], MPI job -> [ssh, svc], default -> [], but there is a problem that what if users want to specify the parameters of job plugins, this may need to add more user configurable parameters

I don't know if my understanding of this feature is correct and considerate. Please correct me and welcome discussion :)

@hzxuzhonghu
Copy link
Collaborator Author

cc @william-wang @Thor-wl

@k82cn
Copy link
Member

k82cn commented Oct 14, 2020

I'm not sure why we need this feature? any background? The purpose of vcjob is to support all kind of high performance workloads by one CRD.

@Thor-wl
Copy link
Contributor

Thor-wl commented Oct 14, 2020

I've not taken a deep investigation about how different compute framework define vcjob to adjust to Volcano. If the users of one framework define vcjob with same plugins, i think it is a good idea to let users just set "framework" field and Volcano adjust the plugins to the framework automatically.

@shinytang6
Copy link
Member

I'm not sure why we need this feature? any background? The purpose of vcjob is to support all kind of high performance workloads by one CRD.

For volcano users (such as me), when I first used it, I would be confused with job plugins. Taking env setting as an example. I actually didn't know what the env plugin would do. From this point it is useful to block these information from users, which l think there is no conflict with 'support all kind of high performance workload by one CRD', it stills use one CRD definition Job. What we need to do is to automatically load the corresponding job plugins according to a new field such like "framework". The function of this field is only to help users identify and facilitate us to dynamically load job plugins.

@alcorj-mizar
Copy link
Contributor

This is a moment that volcano walking from the scheduler framework to batch job framework. If we use "framework" field, it may still integration with project like kubeflow and just provide some automation when creating jobs. But if we use CRD to define different types of workloads, we are trying to reinvent the kubeflow.

@stale
Copy link

stale bot commented Jan 20, 2021

Hello 👋 Looks like there was no activity on this issue for last 90 days.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2021
@stale
Copy link

stale bot commented Mar 21, 2021

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants