Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Deployment via Helm Values #9

Closed
xaviertintin opened this issue Apr 10, 2024 · 10 comments
Closed

Configurable Deployment via Helm Values #9

xaviertintin opened this issue Apr 10, 2024 · 10 comments
Assignees

Comments

@xaviertintin
Copy link
Owner

xaviertintin commented Apr 10, 2024

Make the behavior configurable to REANA admins at the deployment time, so that each admin can choose whether to use the classical approach or the new optional Kueue approach, via Helm values.

@xaviertintin xaviertintin added this to the Deployment Integration milestone Apr 10, 2024
@xaviertintin xaviertintin changed the title M2: Configurable Deployment via Helm Values Configurable Deployment via Helm Values Apr 10, 2024
@xaviertintin
Copy link
Owner Author

The first approach to make the Kueue behavior configurable at deployment time, is to implement the Kueue helm chart into the REANA helm chart.

@xaviertintin
Copy link
Owner Author

Kueue Helm Chart was added to the reana deployment. Kueue is deployed alongside REANA when creating the cluster in debug mode, this change can be seen here

@xaviertintin xaviertintin self-assigned this Apr 13, 2024
@xaviertintin
Copy link
Owner Author

xaviertintin commented Apr 13, 2024

At deployment time, the behavior of Kueue, a workflow scheduler, is optional. The ability to toggle Kueue was introduced via a click option in the cluster_deploy function, as detailed in this commit, with Kueue=False set as the default behavior. By default, the Kueue scheduler for workflow execution is not activated. To enable it, you can use the following command:

reana-dev cluster-deploy --admin-email john.doe@example.org --admin-password mysecretpassword --kueue=True

This command activates the Kueue scheduler during deployment, allowing you to manage workflow execution efficiently.

@giuseppe-steduto
Copy link

giuseppe-steduto commented Apr 15, 2024

Following our discussion, my personal take on this is that it's probably best to avoid using the click option and directly use an Helm value. This is for mainly two reasons:

  1. reana-dev is mainly used by developers, so if someone else wants to deploy using Kueue, they should not need the reana-dev package to do so, but instead everything deployment-related should be configurable by means of Helm values as shown in the deployment pages.
  2. Code consistency: it's the way it is usually done in the rest of the code, and it is the pattern by which configuration options are passed to the REANA cluster. For a list of all the possible publicly configurable Helm values, see https://github.com/reanahub/reana/blob/master/helm/reana/README.md#configuration : for example, if I want to change the DB host used by REANA, that's where I should do it.

Let's take the example of the REANA_WORKFLOW_SCHEDULING_POLICY variable, which lets us choose whether REANA should schedule the workflows following a simple FIFO logic, or if it should take a more "balanced" and smart approach by considering the complexity of the workflow (I chose a random one, but it's very similar for most of the other configuration variables). This variable is documented here and, rather than a click option, it is directly customizable via the Helm value: here it is on the values file, and one can customize it to be either "fifo" or "balanced".

What happens when you deploy the cluster is that, since this is among the components.reana_server.environment values, REANA_SCHEDULING_POLICY becomes an environment variable in the r-server pod. It is then read by the reana-server code in the config.py file: https://github.com/reanahub/reana-server/blob/b7cc00afddb5035a3ed8f964ea33fb94bca3e2d8/reana_server/config.py#L83

A similar approach for your usecase could be something similar:

  • Create a value in the values.yaml file under the components.reana_workflow_controller.environment list: for example, you can call it USE_KUEUE, and set it by default to False
  • In reana_workflow_controller/config.py, read that variable by doing e.g. USE_KUEUE = bool(strtobool(os.getenv("USE_KUEUE", "false"))) and use it wherever in your r-w-controller code :) I think another reason why this is better is that that config variable never changes once it's set, which in a way ensures a "cleaner" code and a more clear approach
  • In your development environment you can set USE_KUEUE to true; In order to do that, probably the best way is updating the values-dev.yaml file (the "development" helm values file that integrate and override the normal values.yaml when you deploy in debug mode) by setting components.reana_workflow_controller.environment.USE_KUEUE to True

P.S. My view is just one perspective on the matter; it's of course not necessarily definitive. What do you think?

@mdonadoni
Copy link

One small addition to Giuseppe's comment: I personally prefer defining new values in the helm chart (e.g. kueue.enabled, kueue.cpu_quota...) instead of (ab)using components.*.environment, as the former let us define the value once which can then be used in multiple components, while the latter forces us to define the same variable multiple times, one for each component. But this is nitpicking :)

@xaviertintin
Copy link
Owner Author

Thank you both for your feedback, I have taken both observations into account.

I have added 2 environment variables:

  • components.reana_workflow_controller.environment.KUEUE_ENABLED
  • components.reana_job_controller.environment.KUEUE_ENABLED

Both REANA components are already programmed to use the Kueue scheduler, only if the variable KUEUE_ENABLED=true in the helm charts. This has been done to function with reana-dev using the following command:

reana-dev cluster-deploy --admin-email john.doe@example.org --admin-password mysecretpassword --mode=debug --kueue=True

The REANA components have been altered so the admin can choose whether to use the classical approach or the new optional Kueue approach, via Helm values.

See the REANA Workflow Controller changes here

See the REANA Job Controller changes here

@xaviertintin
Copy link
Owner Author

xaviertintin commented Apr 22, 2024

An interesting observation is that I am deploying REANA with reana-dev, meaning this is not the common deploy procedure for everyday REANA users. I have explored ways to add a Helm subchart in order to deploy Kueue if kueue.enabled, but I would also need to set these environment variables components.reana_workflow_controller.environment.KUEUE_ENABLED and components.reana_job_controller.environment.KUEUE_ENABLED to TRUE in order for the REANA components to choose the scheduling logic.

How would I use the kueue.enabled helm chart value to also set both environment variables to true? These true values play a big role when submitting jobs into the K8s backend.

@mdonadoni
Copy link

Regarding reana-workflow-controller, you can define the environment variables in its related template, as we do here.

Regarding reana-job-controller, given that it is created by r-w-controller on demand and not from the Helm chart, you will need to add the env variable to the k8s specification that is created here.

To recap:

  1. You add a new value kueue.enabled to values.yaml of the Helm chart
  2. You pass this value as an env variable to r-w-controller in its template helm/reana/templates/reana-workflow-controller.yaml
  3. You read the env variable in config.py of r-w-controller, as suggested by Giuseppe
  4. You pass this value as an env variable to r-j-controller by modifying its k8s specification that is created in reana_workflow_controller/workflow_run_manager.py

@xaviertintin
Copy link
Owner Author

Done, the changes are:

  1. reana
    1. reana/helm/reana/values.yaml
      1. Added Kueue value
    2. reana/helm/reana/templates/reana-workflow-controller.yaml
      1. Added environment variables
  2. reana-workflow-controller
    1. reana-workflow-controller/reana_workflow_controller/config.py
      1. Get environment variable as a bool
    2. reana-workflow-controller/reana_workflow_controller/workflow_run_manager.py
      1. Use environment variable to select deployment type: [standard, kueue]
      2. Pass environment variable to job_controller_env_vars
  3. reana-job-controller
    1. reana-job-controller/reana_job_controller/config.py
      1. Get environment variable as a bool
    2. reana-job-controller/reana_job_controller/kubernetes_job_manager.py
      1. Use environment variable to select deployment type: [standard, kueue]

Commands to deploy with reana-dev:

kind delete cluster --name kind
cd '/Users/alextintin/project/reana/src/reana'
pip install -e .    
cd ~/project/reana/src
reana-dev cluster-create -m /var/reana:/var/reana --mode=debug

VERSION=v0.6.2
kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
kubectl wait --for=condition=available deployment/kueue-controller-manager -n kueue-system --timeout=5m
kubectl apply -f reana/scripts/kueue/resourceFlavor.yaml
kubectl apply -f reana/scripts/kueue/clusterQueue.yaml
kubectl apply -f reana/scripts/kueue/localQueue.yaml

reana-dev cluster-build --parallel 8 --exclude-components r-a-vomsproxy -b DEBUG=1
reana-dev cluster-deploy --admin-email john.doe@example.org --admin-password mysecretpassword --mode=debug

@mdonadoni
Copy link

Just a small comment:

2.i does not do what you expect it to: bool("False") is ... true! Because the string is non-empty, so it is considered as a "thruty" value. You can use strtobool instead.
This same remark applies to 3.i

The rest looks good, except for the hardcoded queue names that you already know about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants