Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override vdk version of deployed job #377

Closed
antoniivanov opened this issue Oct 12, 2021 · 7 comments · Fixed by #476 or #482
Closed

Override vdk version of deployed job #377

antoniivanov opened this issue Oct 12, 2021 · 7 comments · Fixed by #476 or #482
Assignees
Labels

Comments

@antoniivanov
Copy link
Collaborator

antoniivanov commented Oct 12, 2021

What is the feature request? What problem does it solve?
Currently, all jobs use the same vdk version in which to run. This makes it easier to maintain thousands of jobs as they are uniform.
But if we want to do a canary release of vdk by rolling out the initial version of vdk to a subset of users as an initial test.
Or we want to do A/B testing with different flavors of vdk. We need to be able to override the vdk version for subset of jobs.

Another use-case is when vdk introduces a major release (aka breaking change). We'd need to enable users to use both previous and new major releases concurrently so they can gradually switch to the new release.

Suggested solution
During deploy we can set not only job version (as we do not) but also vdk version (by default it's left empty so it uses latest)
This would all us to override the vdk version per deployment basis

Acceptance Criteria

  • I can deploy a single job with a specific (possibly non-officially released) version of vdk in cloud runtime. While all the other jobs use the common version.
@antoniivanov
Copy link
Collaborator Author

antoniivanov commented Oct 21, 2021

Control Service API has an option by design to set vdk_version which will override default vdk version.

Use cases

  1. To enable users to upgrade in a self-service manner when there is major (breaking) upgrade of vdk.
    When there is a breaking change (in the VDK library for example) it may take time for user to migrate data jobs to new "syntax"/python interfaces. So we need to enable them to do it in their own time.

  2. Another one is to enable to run canary release of vdk. All jobs run in the Versatile Data Kit managed runtime use the same version and are upgraded automatically (and at the same time).This make sure that the new release indeed work , it is configured correctly and job would succeed.

  3. Similarly to 2 A/B testing - introduce different version of vdk with different semantics - e.g different query optimization logic and compare which one works better.

  4. Local development. vdk-server plugin seem useful for local development. If you are making changes on VDK project only , there's no need to use the Control Service code base at all. Start vdk server locally and test how the job un in managed runtime. But vdk-server need to allow to set a local version of vdk. One option is to be addressed by add option to configure the version of vdk when installing vdk-server #392 But if user can override vdk image per job during deployment - that's another option (vdk deploy --update --vdk-version=image:v.1.0.1

@antoniivanov
Copy link
Collaborator Author

antoniivanov commented Oct 22, 2021

Option 1. (WINNER)

Have vdk_version set the version of the distribution.
So user would do vdk deploy --update --vdk-version 1.1

Users can set the version. And we'd take care of them.
They can list versions of their distributions with pip index versions quickstart-vdk for example (quickstart-vdk is the name of the customer distribution)

Pros:

  • Simple for the user. Need to know only the previous versions (which can be seen with simple pip command)

Cons:

  • The image tag may not exist for that version (there's a dependency between what is in pip and what is in docker)
  • Implementaiton is more "hacky" as it requires setting tag
  • Use case 4 is not solved here

Option 2: (DISCARDED)

Pros:

  • Enables Use case 4
  • It is clear to the user what is the likely error and they can solve it themselves (they can upload the image if it is missing)

Cons:

  • It is more complex for the user. Docker is an "implementation" detail that vdk user need not know . We are forcing them to know it. As major goal of VDK is ease of use and lowering technical barrier - this does not.
  • It is less secure. As the container image can contain anything.
  • It is more brittle, The image can be in any registry - so network connectivity and Registry stability become more potential issues
  • If there's an authentication it may not work without administrative (Control Service Operators) help
  • It is less secure since the vdk image is no longer in control of Control Service Operator but is given to end user. Malicous image could be injected (for example tries to access root directories and so on).

Option 3 (DISCARDED)

(proposed by @gageorgiev )

See details for this option in #377 (comment)

Pros:

  • no dev cost is 0, it should work now
  • It's as simple for end user as option 1

Cons:

  • There would be corner cases that will not work (if different versions have the same modules/packages but different code/semantics). Though that seems like a deal-breaker. But it may be a viable workaround that lowers the priority of the other implementations.
  • Require full re-deployment of the data jobs - this makes use case 3 unlikely to work - would have to upgrade half of the deployed jobs.

Requirements matrix

See above post for details on use-cases.

Use case or requirement Option 1 Option 2 Option 3
1 (self service major upgrades) yes yes yes
2 (canary releases) yes yes yes
3 (A/B Testing) yes yes mostly no
4 (Local development) no yes perhaps yes
ease of use yes no yes
initial dev cost day(s) day(s) 0
maintenance cost smaller bigger biggest?
security low risk high risk low risk?

If we put more emphasis on ease of use requirement and accept we have other solutions for use case 4 I am inclined to favor option 1 more.

@gageorgiev
Copy link
Contributor

Maybe I'm missing something, but what's the issue with just including a vdk-core==1.2.3 line in the requirements.txt file of a data job?

@antoniivanov
Copy link
Collaborator Author

antoniivanov commented Oct 22, 2021

Maybe I'm missing something, but what's the issue with just including a vdk-core==1.2.3 line in the requirements.txt file of a data job?

@gageorgiev

Well it probably would be quickstart-vdk==1.2.3 (which would contain all the plugins and vdk-core) .
But I got your point. I discarded because I don't think it would work. But perhaps I was too quick.

The way a data job deployment is that it includes vdk and its dependencies as initContainer. While the main container stores the job with its dependencies and shared emptyDir volume between the two containers.

In general first we initialize the whole cycle by pulling latest vdk and copying its dependencies in the shared folder and then we use the vdk from that folder to run the job.

Data Job Deployment CronJob looks something like this

initContainers:
    - image: vkd-repo/vdk:release
      imagePullPolicy: Always
      name: vdk
      command:
        [
          "/bin/bash",
          "-c",
          "cp -r /usr/local/lib/python3.7/site-packages /vdk/. && cp /usr/local/bin/vdk /vdk/.",
        ]
      volumeMounts:
        - mountPath: /vdk
          name: vdk-volume

The job container includes all the dependencies pointed in the job's requirements.txt and does not include vdk. The job container spec looks like this

containers:
    - image: jobs/example:1.0.0
      imagePullPolicy: Always
      name: example # container with job code only
      command:
        [
          "/bin/bash",
          "-c",
          "export PYTHONPATH=/vdk/site-packages/ && /vdk/vdk run ./example",
        ]
      volumeMounts:
        - mountPath: /vdk
          name: vdk-volume

This enables Zero time upgrade of vdk for all jobs. Which is very beneficial when you have hundreds or thousands of jobs.

We start date job run this way
export PYTHONPATH=/vdk/site-packages/ && /vdk/vdk run ./example

/vdk/vdk is used from the vdk init container.

This does not mean it may not work entirely though. As it would search for packages in python path . and since the user python path should be preferred over vdk, it may pick the correct packages. (see https://docs.python.org/3/reference/import.html#searching)

Major downside is that It's going to be more brittle (it will not always work) though if vdk 1.1 and vdk 1.2 are different and have conflicting dependencies themselves it may cause issues in a

But let's considered it as Option 3

@antoniivanov
Copy link
Collaborator Author

Updated post #377 (comment) with Option 3 analysis as well.

@mivanov1988
Copy link
Collaborator

I am in favor of option 1. But I think the source of truth should be the docker registry. Users should get vdk versions from their docker registry (e.g. https://hub.docker.com/r/versatiledatakit/quickstart-vdk/tags).

In terms of use case 4, I think we just need to point the Control Service to the local docker registry (datajobs.vdk.image=docker-registry:5000/local-vdk-image:latest) during the deployment (vdk server --start). Then the users should build local docker image and push it to the local docker registry (this could be automated).

@mivanov1988
Copy link
Collaborator

Btw, very good explanation of the problem and solutions.

@antoniivanov antoniivanov linked a pull request Nov 2, 2021 that will close this issue
antoniivanov added a commit that referenced this issue Nov 4, 2021
Setting vdk version is someitmes necessary to enable canary releases and
A/B Testing. See #377

Adding command to set vdk version. It is hidden for now as it's
experimental

Signed-off-by: Antoni Ivanov <aivanov@vmware.com>
antoniivanov added a commit that referenced this issue Nov 5, 2021
Setting vdk version is someitmes necessary to enable canary releases and
A/B Testing. See #377

Adding command to set vdk version. It is hidden for now as it's
experimental

Signed-off-by: Antoni Ivanov <aivanov@vmware.com>
antoniivanov added a commit that referenced this issue Nov 5, 2021
Setting vdk version is someitmes necessary to enable canary releases and
A/B Testing. See #377

Adding command to set vdk version. It is hidden for now as it's
experimental

Signed-off-by: Antoni Ivanov <aivanov@vmware.com>
@antoniivanov antoniivanov linked a pull request Nov 5, 2021 that will close this issue
antoniivanov added a commit that referenced this issue Nov 10, 2021
Setting vdk version is someitmes necessary to enable canary releases and
A/B Testing. See #377

Adding command to set vdk version. It is hidden for now as it's
experimental

Signed-off-by: Antoni Ivanov <aivanov@vmware.com>
antoniivanov added a commit that referenced this issue Nov 10, 2021
* vdk-control-cli: add command to set vdk version

Setting vdk version is someitmes necessary to enable canary releases and
A/B Testing. See #377

Adding command to set vdk version. It is hidden for now as it's
experimental

Signed-off-by: Antoni Ivanov <aivanov@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants