Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.21.0 hybrid #29

Merged
merged 4 commits into from Feb 22, 2024
Merged

2.21.0 hybrid #29

merged 4 commits into from Feb 22, 2024

Conversation

ana-v-espinoza
Copy link

CC: @julienchastang

Hey Andrea,

Apologies for the long PR description, but I feel like it contains some relevant information.

Here are the changes needed to deploy a hybrid cluster. In short, to deploy a hybrid cluster do everything as you normally would, with the exception of these 3 things:

  1. Explicitly set number_of_k8s_nodes and number_of_k8s_nodes_no_floating_ip to 0. This is required.
  2. Declare the k8s_nodes variable. The availability zone (av), flavor and floating_ip are the only required fields
  3. Specify which nodes will be GPU nodes by adding the "extra_groups": "gpu-node" value.

A few notes:

  • The necessary changes to containerd.yml to enable the nvidia container runtime are contained in an ansible group_var file specific only to the "gpu-node" group. As such, there is no longer a need for a different branch_v<version>_gpu branch for anything other than book-keeping and documenting the changes necessary to enable GPU capability. Simply specify which nodes should be GPU enabled by adding them to the group, or if deploying a fully GPU cluster, add them all to the group with the supplementary_node_groups var. See my note on this in cluster.tfvars.

  • When running a JupyterHub on top of a CPU/GPU hybrid cluster, it may be necessary to do two things: 1) create two separate single user images, one for CPU usage and one for GPU usage, and set the appropriate kubespawner_override values; and 2) disable the hook image puller and the continuous image puller (see the JHub config snippet below). This is because some GPU enabled singleuser images are ultimately based off of a CUDA image and will expect GPUs to be available, as is the case with the one currently in your JupyterHub deployment repository. In other words, attempting to run this image in a hybrid cluster will result in some errors.

    I am currently working on creating some single user images that make this a non-problem by installing CUDA in a conda environment. You can see some preliminary work for this here. Expect a PR to address this problem in that repository soon.

    prePuller:
      hook:
        enabled: false
      continuous:
        enabled: false
  • In principle, this could be applied for things other than a CPU/GPU hybrid cluster. For example, we've ran across instances where multiple people are concurrently running computationally intensive tasks and crash the JupyterHub. The solution to this is to run the JHub "core" pods on a dedicated node, which probably can be ran on something smaller than the m3.medium that we typically use for our cluster. See more details about this here.

Let me know if you have any questions,

Ana

Copy link
Owner

@zonca zonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have more feedback when I test this

@@ -596,6 +596,7 @@ resource "openstack_compute_instance_v2" "k8s_nodes" {
user_data = each.value.cloudinit != null ? templatefile("${path.module}/templates/cloudinit.yaml.tmpl", {
extra_partitions = each.value.cloudinit.extra_partitions
}) : data.cloudinit_config.cloudinit.rendered
security_groups = var.port_security_enabled ? local.worker_sec_groups : null
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this possibly due to a bug in kubespray?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same thing, and after some reading I concluded that this likely is a problem that merits another PR or at least further discussion. In summary, I think it was an oversight when removing port definitions and then fixing the broken security groups in a future commit. Since we hadn't used the k8s_nodes resource, it was never updated.

I think that series of commits was done to force Terraform to add instances to the auto_allocated_network. I would suggest we find a way to accomplish this without removing the ports resources so that we diverge as little as possible from "vanilla" Kubespray. This would make updating easier.

I can open a new issue that describes what I think the issue is more in depth and work on this when I have time?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sure, thanks, keep it low priority, maybe we can reconsider this when we update kubespray the next time

inventory/kubejetstream/cluster.tfvars Show resolved Hide resolved
@zonca
Copy link
Owner

zonca commented Feb 8, 2024

first step I tested this branch creating a simple CPU only deployment and it worked fine, next I'll do a GPU only and then a hybrid.

@zonca
Copy link
Owner

zonca commented Feb 8, 2024

@ana-v-espinoza what do you get as container runtime for gpu nodes? I was expecting it to be different from the master nodes runtime, but I get:

> k get nodes -o wide                                                                    
NAME                       STATUS   ROLES           AGE     VERSION   INTERNAL-IP   EXTERNAL-IP       OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME                            
kubejetstream-1            Ready    control-plane   3m17s   v1.25.6   10.0.74.120   149.165.171.16    Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.15
kubejetstream-k8s-node-1   Ready    <none>          2m13s   v1.25.6   10.0.74.35    149.165.175.120   Ubuntu 22.04.3 LTS   5.15.0-94-generic   containerd://1.6.15           

so I am wondering if anything is wrong

@ana-v-espinoza
Copy link
Author

I get the same:

[openstack@6b897015278a jetstream_kubespray]$ kubectl get nodes -o wide
NAME                       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
xxxx-1                   Ready    control-plane   22d   v1.25.6   xxxx   xxxx   Ubuntu 20.04.6 LTS   5.15.0-91-generic   containerd://1.6.15
xxxx-k8s-node-nf-cpu-1   Ready    <none>          22d   v1.25.6   xxxx    <none>          Ubuntu 20.04.6 LTS   5.15.0-91-generic   containerd://1.6.15
xxxx-k8s-node-nf-gpu-1   Ready    <none>          22d   v1.25.6   xxxx   <none>          Ubuntu 20.04.6 LTS   5.15.0-91-generic   containerd://1.6.15

As far as I know, this is normal, as I believe this will only show which container runtime interface (CRI) K8s is using for this node.

To see the actual container runtime, ssh into your GPU node and run a sudo containerd config dump.

You should see a block of the config that looks like:

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

This is as expected by what is defined in the inventory/$CLUSTER/group_vars/gpu-node/containerd.yaml file.

@zonca
Copy link
Owner

zonca commented Feb 9, 2024

ok, that works fine, thanks!
so as I was working on this I also wanted to update the base image to Ubuntu 22.
However, while with Ubuntu 20 GPU containers work fine, on Ubuntu 22 I see all containers on the GPU node crashing. However looking at the node nothing seems wrong, memory/disk/cpu everything looks normal.
Have you ever tested with Ubuntu 22?

@ana-v-espinoza
Copy link
Author

That's not something I considered. My first guess would be a mismatch between the cuda version in the container and that which is compatible with the driver installed on Ubuntu 22.

What does nvidia-smi say?

Does a kubectl describe pod -n jhub <pod-name> reveal any useful information?

Or maybe the kubelet logs on the GPU node?: journalctl -ru kubelet

I may test this with Ubuntu 22 tomorrow myself if it seems like this will be a difficult problem to track down.

@zonca
Copy link
Owner

zonca commented Feb 9, 2024

nvidia-smi 
Fri Feb  9 02:15:39 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID A100X-10C                 On  | 00000000:04:00.0 Off |                    0 |
| N/A   N/A    P0              N/A /  N/A |      1MiB / 10240MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

@zonca
Copy link
Owner

zonca commented Feb 9, 2024

@zonca
Copy link
Owner

zonca commented Feb 9, 2024

there are some strange errors in the pods, for example failing to mount a configmap:

  Warning  FailedMount     45m                    kubelet            MountVolume.SetUp failed for volume "kube-api-access-5r92f" : [failed to fetch token: Post "https://localhost:6443/api/v1/namespaces/kube-system/serviceaccounts/nodelocaldns/token": dial tcp 127.0.0.1:6443: connect: connection refused, failed to sync configmap cache: timed out waiting for the condition]
  Warning  FailedMount     45m (x2 over 45m)      kubelet            MountVolume.SetUp failed for volume "config-volume" : failed to sync configmap cache: timed out waiting for the condition

Maybe a networking issue?
I'll just revert back to Ubuntu 20 for this tutorial. Let's continue later on in zonca/jupyterhub-deploy-kubernetes-jetstream#73

@zonca
Copy link
Owner

zonca commented Feb 20, 2024

gpu-only deployment with Ubuntu 20 worked fine

# "az" = "nova"
# "flavor": "10"
# "floating_ip": false
# "extra_groups": "gpu-node"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ana-v-espinoza How do I create 2 profiles, 1 for CPU and 1 for GPU?
I see there is an extra group here, but it seems it is only in terraform and not in Kubernetes.
This works for GPU pods, but not for CPU:
https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/blob/master/gpu/jupyterhub_gpu.yaml#L2-L9

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Andrea,

I'm using the same "profiles" config option. Here's my snippet. You'll notice that I don't override the image in the GPU profile, as I'm using an image similar to that discussed in zonca/jupyterhub-deploy-kubernetes-jetstream#72

singleuser:
  image:
    name: "unidata/hybrid-gpu"
    tag: "minimal-tf"
  profileList:
  - display_name: "CPU Server"
    default: true
  - display_name: "GPU Server"
    kubespawner_override:
      extra_resource_limits:
        nvidia.com/gpu: "1"

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that I request a CPU server, but I spawn on a GPU node, I am wondering if it is better to restrict CPU-only users to run on CPU nodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay I see what you mean! Yeah, you can taint the GPU node(s), then add a toleration in kubespawner_override for the GPU profile.

@zonca
Copy link
Owner

zonca commented Feb 21, 2024

Deployment of kubernetes on the hybrid cluster on Ubuntu 20 worked fine, now starting to test JupyterHub on the Hybrid cluster

@zonca
Copy link
Owner

zonca commented Feb 22, 2024

ok, all tests completed, it works great,
I posted it to https://www.zonca.dev/posts/2024-02-09-kubernetes-gpu-jetstream2
can you please review it? Then I'll make a pull request to the JS2 docs

@zonca zonca merged commit b90bcd5 into zonca:branch_v2.21.0 Feb 22, 2024
@ana-v-espinoza
Copy link
Author

@zonca Looks good to me, but I think my perception of the post might be skewed since you and I have been working on this in depth for some time now.

Perhaps @julienchastang can give better feedback about anything that might need more detail or clarification. Julien could you please take a look at Andrea's new blog post about the work we've been doing here in this PR? (also linked above)
https://www.zonca.dev/posts/2024-02-09-kubernetes-gpu-jetstream2

@julienchastang
Copy link
Collaborator

Thanks. I read it once, but I would like to study it more carefully and actually launch a cluster according to what you have described. I will try to find time to do that in the near future. One thing I noticed is the image is still built upon nvcr.io/nvidia/tensorflow:22.04-tf2-py3 which is rather heavy weight as we discussed previously. However, that may be a separate issue that needs to be dealt with independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants