-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.21.0 hybrid #29
2.21.0 hybrid #29
Changes from all commits
1240276
7cb843e
b759b9d
0d49b38
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,50 @@ number_of_k8s_nodes_no_floating_ip = 0 | |
|
||
flavor_k8s_node = "4" | ||
|
||
# # Uncomment when all nodes will be GPU nodes | ||
# # If you wish to use this var for another reason, add the ansible groups as a comma seperated list | ||
# # E.g "additional-group-1,additional-group2,etc" | ||
# supplementary_node_groups = "gpu-node" | ||
|
||
# BEGIN HYBRID CLUSTER CONFIG | ||
|
||
# # Set to true by default, but we make it explicit here | ||
# port_security_enabled = true | ||
|
||
# # Must be uncommented and set to 0 to use the k8s_nodes variable | ||
# number_of_k8s_nodes = 0 | ||
# number_of_k8s_nodes_no_floating_ip = 0 | ||
|
||
# # "<cluster-name>-k8s-node-" will be prepended to each key name and used to create the instance name. | ||
# # E.g the first item below would result in an instanced named "<cluster-name>-k8s-node-nf-cpu-1" | ||
zonca marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# # For a full list of options see ./contrib/terraform/openstack/README.md#k8s_nodes | ||
# k8s_nodes = { | ||
# "nf-cpu-1" = { | ||
# "az" = "nova" | ||
# "flavor": "4" | ||
# "floating_ip": false | ||
# }, | ||
# "nf-cpu-2" = { | ||
# "az" = "nova" | ||
# "flavor": "4" | ||
# "floating_ip": false | ||
# }, | ||
# "nf-gpu-1" = { | ||
# "az" = "nova" | ||
# "flavor": "10" | ||
# "floating_ip": false | ||
# "extra_groups": "gpu-node" | ||
# }, | ||
# "nf-gpu-2" = { | ||
# "az" = "nova" | ||
# "flavor": "10" | ||
# "floating_ip": false | ||
# "extra_groups": "gpu-node" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ana-v-espinoza How do I create 2 profiles, 1 for CPU and 1 for GPU? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey Andrea, I'm using the same "profiles" config option. Here's my snippet. You'll notice that I don't override the image in the GPU profile, as I'm using an image similar to that discussed in zonca/jupyterhub-deploy-kubernetes-jetstream#72 singleuser:
image:
name: "unidata/hybrid-gpu"
tag: "minimal-tf"
profileList:
- display_name: "CPU Server"
default: true
- display_name: "GPU Server"
kubespawner_override:
extra_resource_limits:
nvidia.com/gpu: "1" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that I request a CPU server, but I spawn on a GPU node, I am wondering if it is better to restrict CPU-only users to run on CPU nodes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah okay I see what you mean! Yeah, you can taint the GPU node(s), then add a toleration in kubespawner_override for the GPU profile. |
||
# }, | ||
# } | ||
|
||
# END HYBRID CLUSTER CONFIG | ||
|
||
# GlusterFS | ||
# either 0 or more than one | ||
#number_of_gfs_nodes_no_floating_ip = 0 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
# Please see roles/container-engine/containerd/defaults/main.yml for more configuration options | ||
|
||
# containerd_storage_dir: "/var/lib/containerd" | ||
# containerd_state_dir: "/run/containerd" | ||
# containerd_oom_score: 0 | ||
|
||
containerd_default_runtime: "nvidia" | ||
# containerd_snapshotter: "native" | ||
|
||
containerd_runc_runtime: | ||
name: nvidia | ||
type: "io.containerd.runc.v2" | ||
engine: "" | ||
root: "" | ||
options: | ||
BinaryName : '"/usr/bin/nvidia-container-runtime"' | ||
|
||
|
||
# containerd_additional_runtimes: | ||
# Example for Kata Containers as additional runtime: | ||
# - name: kata | ||
# type: "io.containerd.kata.v2" | ||
# engine: "" | ||
# root: "" | ||
|
||
# containerd_grpc_max_recv_message_size: 16777216 | ||
# containerd_grpc_max_send_message_size: 16777216 | ||
|
||
# containerd_debug_level: "info" | ||
|
||
# containerd_metrics_address: "" | ||
|
||
# containerd_metrics_grpc_histogram: false | ||
|
||
## An obvious use case is allowing insecure-registry access to self hosted registries. | ||
## Can be ipaddress and domain_name. | ||
## example define mirror.registry.io or 172.19.16.11:5000 | ||
## set "name": "url". insecure url must be started http:// | ||
## Port number is also needed if the default HTTPS port is not used. | ||
# containerd_insecure_registries: | ||
# "localhost": "http://127.0.0.1" | ||
# "172.19.16.11:5000": "http://172.19.16.11:5000" | ||
|
||
# containerd_registries: | ||
# "docker.io": "https://registry-1.docker.io" | ||
|
||
# containerd_max_container_log_line_size: -1 | ||
|
||
# containerd_registry_auth: | ||
# - registry: 10.0.0.2:5000 | ||
# username: user | ||
# password: pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this possibly due to a bug in kubespray?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the same thing, and after some reading I concluded that this likely is a problem that merits another PR or at least further discussion. In summary, I think it was an oversight when removing port definitions and then fixing the broken security groups in a future commit. Since we hadn't used the
k8s_nodes
resource, it was never updated.I think that series of commits was done to force Terraform to add instances to the auto_allocated_network. I would suggest we find a way to accomplish this without removing the ports resources so that we diverge as little as possible from "vanilla" Kubespray. This would make updating easier.
I can open a new issue that describes what I think the issue is more in depth and work on this when I have time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sure, thanks, keep it low priority, maybe we can reconsider this when we update kubespray the next time