2.21.0 hybrid #29

ana-v-espinoza · 2024-01-29T23:51:56Z

Hey Andrea,

Apologies for the long PR description, but I feel like it contains some relevant information.

Here are the changes needed to deploy a hybrid cluster. In short, to deploy a hybrid cluster do everything as you normally would, with the exception of these 3 things:

Explicitly set number_of_k8s_nodes and number_of_k8s_nodes_no_floating_ip to 0. This is required.
Declare the k8s_nodes variable. The availability zone (av), flavor and floating_ip are the only required fields
Specify which nodes will be GPU nodes by adding the "extra_groups": "gpu-node" value.

A few notes:

The necessary changes to containerd.yml to enable the nvidia container runtime are contained in an ansible group_var file specific only to the "gpu-node" group. As such, there is no longer a need for a different branch_v<version>_gpu branch for anything other than book-keeping and documenting the changes necessary to enable GPU capability. Simply specify which nodes should be GPU enabled by adding them to the group, or if deploying a fully GPU cluster, add them all to the group with the supplementary_node_groups var. See my note on this in cluster.tfvars.
When running a JupyterHub on top of a CPU/GPU hybrid cluster, it may be necessary to do two things: 1) create two separate single user images, one for CPU usage and one for GPU usage, and set the appropriate kubespawner_override values; and 2) disable the hook image puller and the continuous image puller (see the JHub config snippet below). This is because some GPU enabled singleuser images are ultimately based off of a CUDA image and will expect GPUs to be available, as is the case with the one currently in your JupyterHub deployment repository. In other words, attempting to run this image in a hybrid cluster will result in some errors.

I am currently working on creating some single user images that make this a non-problem by installing CUDA in a conda environment. You can see some preliminary work for this here. Expect a PR to address this problem in that repository soon.
```
prePuller:
  hook:
    enabled: false
  continuous:
    enabled: false
```
In principle, this could be applied for things other than a CPU/GPU hybrid cluster. For example, we've ran across instances where multiple people are concurrently running computationally intensive tasks and crash the JupyterHub. The solution to this is to run the JHub "core" pods on a dedicated node, which probably can be ran on something smaller than the m3.medium that we typically use for our cluster. See more details about this here.

Let me know if you have any questions,

Ana

zonca

I'll have more feedback when I test this

zonca · 2024-02-05T23:56:27Z

contrib/terraform/openstack/modules/compute/main.tf

@@ -596,6 +596,7 @@ resource "openstack_compute_instance_v2" "k8s_nodes" {
  user_data         = each.value.cloudinit != null ? templatefile("${path.module}/templates/cloudinit.yaml.tmpl", {
    extra_partitions = each.value.cloudinit.extra_partitions
  }) : data.cloudinit_config.cloudinit.rendered
+  security_groups = var.port_security_enabled ? local.worker_sec_groups : null


is this possibly due to a bug in kubespray?

I was wondering the same thing, and after some reading I concluded that this likely is a problem that merits another PR or at least further discussion. In summary, I think it was an oversight when removing port definitions and then fixing the broken security groups in a future commit. Since we hadn't used the k8s_nodes resource, it was never updated.

I think that series of commits was done to force Terraform to add instances to the auto_allocated_network. I would suggest we find a way to accomplish this without removing the ports resources so that we diverge as little as possible from "vanilla" Kubespray. This would make updating easier.

I can open a new issue that describes what I think the issue is more in depth and work on this when I have time?

yes, sure, thanks, keep it low priority, maybe we can reconsider this when we update kubespray the next time

inventory/kubejetstream/cluster.tfvars

zonca · 2024-02-08T19:56:53Z

first step I tested this branch creating a simple CPU only deployment and it worked fine, next I'll do a GPU only and then a hybrid.

zonca · 2024-02-08T23:10:31Z

@ana-v-espinoza what do you get as container runtime for gpu nodes? I was expecting it to be different from the master nodes runtime, but I get:

> k get nodes -o wide                                                                    
NAME                       STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP       OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME                            
kubejetstream-1            Ready    control-plane   3m17s   v1.25.6   10.0.74.120   149.165.171.16    Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.15
kubejetstream-k8s-node-1   Ready    <none>          2m13s   v1.25.6   10.0.74.35    149.165.175.120   Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.15

so I am wondering if anything is wrong

ana-v-espinoza · 2024-02-09T00:13:12Z

I get the same:

[openstack@6b897015278a jetstream_kubespray]$ kubectl get nodes -o wide
NAME                       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
xxxx-1                   Ready    control-plane   22d   v1.25.6   xxxx   xxxx   Ubuntu 20.04.6 LTS   5.15.0-91-generic   containerd://1.6.15
xxxx-k8s-node-nf-cpu-1   Ready    <none>          22d   v1.25.6   xxxx    <none>          Ubuntu 20.04.6 LTS   5.15.0-91-generic   containerd://1.6.15
xxxx-k8s-node-nf-gpu-1   Ready    <none>          22d   v1.25.6   xxxx   <none>          Ubuntu 20.04.6 LTS   5.15.0-91-generic   containerd://1.6.15

As far as I know, this is normal, as I believe this will only show which container runtime interface (CRI) K8s is using for this node.

To see the actual container runtime, ssh into your GPU node and run a sudo containerd config dump.

You should see a block of the config that looks like:

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

This is as expected by what is defined in the inventory/$CLUSTER/group_vars/gpu-node/containerd.yaml file.

zonca · 2024-02-09T00:36:48Z

ok, that works fine, thanks!
so as I was working on this I also wanted to update the base image to Ubuntu 22.
However, while with Ubuntu 20 GPU containers work fine, on Ubuntu 22 I see all containers on the GPU node crashing. However looking at the node nothing seems wrong, memory/disk/cpu everything looks normal.
Have you ever tested with Ubuntu 22?

ana-v-espinoza · 2024-02-09T00:50:06Z

That's not something I considered. My first guess would be a mismatch between the cuda version in the container and that which is compatible with the driver installed on Ubuntu 22.

What does nvidia-smi say?

Does a kubectl describe pod -n jhub <pod-name> reveal any useful information?

Or maybe the kubelet logs on the GPU node?: journalctl -ru kubelet

I may test this with Ubuntu 22 tomorrow myself if it seems like this will be a difficult problem to track down.

zonca · 2024-02-09T02:17:02Z

nvidia-smi 
Fri Feb  9 02:15:39 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID A100X-10C                 On  | 00000000:04:00.0 Off |                    0 |
| N/A   N/A    P0              N/A /  N/A |      1MiB / 10240MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

zonca · 2024-02-09T02:39:26Z

Kubelet: https://gist.github.com/zonca/4c050f7ba7727d047a765c68e718dbb0

zonca · 2024-02-09T02:57:37Z

there are some strange errors in the pods, for example failing to mount a configmap:

  Warning  FailedMount     45m                    kubelet            MountVolume.SetUp failed for volume "kube-api-access-5r92f" : [failed to fetch token: Post "https://localhost:6443/api/v1/namespaces/kube-system/serviceaccounts/nodelocaldns/token": dial tcp 127.0.0.1:6443: connect: connection refused, failed to sync configmap cache: timed out waiting for the condition]
  Warning  FailedMount     45m (x2 over 45m)      kubelet            MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition

Maybe a networking issue?
I'll just revert back to Ubuntu 20 for this tutorial. Let's continue later on in zonca/jupyterhub-deploy-kubernetes-jetstream#73

zonca · 2024-02-20T23:34:35Z

gpu-only deployment with Ubuntu 20 worked fine

zonca · 2024-02-21T08:15:20Z

inventory/kubejetstream/cluster.tfvars

+#     "az" = "nova"
+#     "flavor": "10"
+#     "floating_ip": false
+#     "extra_groups": "gpu-node"


@ana-v-espinoza How do I create 2 profiles, 1 for CPU and 1 for GPU?
I see there is an extra group here, but it seems it is only in terraform and not in Kubernetes.
This works for GPU pods, but not for CPU:
https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/blob/master/gpu/jupyterhub_gpu.yaml#L2-L9

Hey Andrea,

I'm using the same "profiles" config option. Here's my snippet. You'll notice that I don't override the image in the GPU profile, as I'm using an image similar to that discussed in zonca/jupyterhub-deploy-kubernetes-jetstream#72

singleuser: image: name: "unidata/hybrid-gpu" tag: "minimal-tf" profileList: - display_name: "CPU Server" default: true - display_name: "GPU Server" kubespawner_override: extra_resource_limits: nvidia.com/gpu: "1"

The problem is that I request a CPU server, but I spawn on a GPU node, I am wondering if it is better to restrict CPU-only users to run on CPU nodes.

Ah okay I see what you mean! Yeah, you can taint the GPU node(s), then add a toleration in kubespawner_override for the GPU profile.

zonca · 2024-02-21T08:35:41Z

Deployment of kubernetes on the hybrid cluster on Ubuntu 20 worked fine, now starting to test JupyterHub on the Hybrid cluster

zonca · 2024-02-22T02:48:51Z

ok, all tests completed, it works great,
I posted it to https://www.zonca.dev/posts/2024-02-09-kubernetes-gpu-jetstream2
can you please review it? Then I'll make a pull request to the JS2 docs

ana-v-espinoza · 2024-02-22T17:33:37Z

@zonca Looks good to me, but I think my perception of the post might be skewed since you and I have been working on this in depth for some time now.

Perhaps @julienchastang can give better feedback about anything that might need more detail or clarification. Julien could you please take a look at Andrea's new blog post about the work we've been doing here in this PR? (also linked above)
https://www.zonca.dev/posts/2024-02-09-kubernetes-gpu-jetstream2

julienchastang · 2024-02-22T21:03:06Z

Thanks. I read it once, but I would like to study it more carefully and actually launch a cluster according to what you have described. I will try to find time to do that in the near future. One thing I noticed is the image is still built upon nvcr.io/nvidia/tensorflow:22.04-tf2-py3 which is rather heavy weight as we discussed previously. However, that may be a separate issue that needs to be dealt with independently.

ana-v-espinoza added 4 commits January 29, 2024 14:41

GPU-only config done via ansible group var

1240276

Add hybrid cluster config

7cb843e

Ensure security groups are added to nodes created via k8s_nodes var

b759b9d

Correction to supplementary_node_groups var (list --> string)

0d49b38

This was referenced Jan 30, 2024

Future PR: "Hybrid" CPU/GPU Cluster and "Soft Scaling" zonca/jupyterhub-deploy-kubernetes-jetstream#70

Closed

Gpu improvements zonca/jupyterhub-deploy-kubernetes-jetstream#72

Open

zonca reviewed Feb 5, 2024

View reviewed changes

zonca mentioned this pull request Feb 9, 2024

Pods crash on GPU nodes with Ubuntu 22 zonca/jupyterhub-deploy-kubernetes-jetstream#73

Open

zonca reviewed Feb 21, 2024

View reviewed changes

zonca merged commit b90bcd5 into zonca:branch_v2.21.0 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.21.0 hybrid #29

2.21.0 hybrid #29

ana-v-espinoza commented Jan 29, 2024

zonca left a comment

zonca Feb 5, 2024

ana-v-espinoza Feb 6, 2024

zonca Feb 7, 2024

zonca commented Feb 8, 2024

zonca commented Feb 8, 2024

ana-v-espinoza commented Feb 9, 2024

zonca commented Feb 9, 2024

ana-v-espinoza commented Feb 9, 2024

zonca commented Feb 9, 2024

zonca commented Feb 9, 2024

zonca commented Feb 9, 2024

zonca commented Feb 20, 2024

zonca Feb 21, 2024

ana-v-espinoza Feb 21, 2024

zonca Feb 21, 2024

ana-v-espinoza Feb 21, 2024

zonca commented Feb 21, 2024

zonca commented Feb 22, 2024

ana-v-espinoza commented Feb 22, 2024

julienchastang commented Feb 22, 2024

2.21.0 hybrid #29

2.21.0 hybrid #29

Conversation

ana-v-espinoza commented Jan 29, 2024

zonca left a comment

Choose a reason for hiding this comment

zonca Feb 5, 2024

Choose a reason for hiding this comment

ana-v-espinoza Feb 6, 2024

Choose a reason for hiding this comment

zonca Feb 7, 2024

Choose a reason for hiding this comment

zonca commented Feb 8, 2024

zonca commented Feb 8, 2024

ana-v-espinoza commented Feb 9, 2024

zonca commented Feb 9, 2024

ana-v-espinoza commented Feb 9, 2024

zonca commented Feb 9, 2024

zonca commented Feb 9, 2024

zonca commented Feb 9, 2024

zonca commented Feb 20, 2024

zonca Feb 21, 2024

Choose a reason for hiding this comment

ana-v-espinoza Feb 21, 2024

Choose a reason for hiding this comment

zonca Feb 21, 2024

Choose a reason for hiding this comment

ana-v-espinoza Feb 21, 2024

Choose a reason for hiding this comment

zonca commented Feb 21, 2024

zonca commented Feb 22, 2024

ana-v-espinoza commented Feb 22, 2024

julienchastang commented Feb 22, 2024