-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscaling Kubernetes on Jetstream with Cluster Autoscaler #15
Comments
Openstack support was merged in March, kubernetes/autoscaler#1690 |
it is based on Magnum, so we should abandon kubespray anyway. Still I think it is worth a try, as long as it doesn't require too much effort. |
Deployment with magnum works, see #16, next I'll work on this |
@rsignell-usgs @julienchastang @ktyle deployed the autoscaler on top of the magnum deployment, it authenticates fine and when there are many nodes pending, it requests a new node to the Openstack API. Asked XSEDE if there is anything we can do to speed it up, because if we have users waiting for a Jupyter Notebook ideally we would like them to wait ~5min. Otherwise, I'll have to recompile the container, as the wait time is a constant in the Go codebase. |
One thing I have noticed is really long |
On the IU cloud, all new deployments get a forced update. If your image is old and has lots of pending updates, that might be the cause. You can override it with cloud-init by doing something like this: #cloud-config package_update: false final_message: "Boot completed in $UPTIME seconds" in a script. This is noted here -- http://wiki.jetstream-cloud.org/Using+cloud-init+scripts+with+the+Jetstream+API From the CLI it's invoked with the --file switch at launch. With terraform or other tools, I'm less sure how to include it but hopefully it's possible. Looks like it might be: |
thanks! it looks a good idea, I need to understand how to modify the HEAT templates in Magnum to provide a cloud-config, I'll try and report back here. |
Also simply working with a more up-to-date image that will not have so many out of date debian packages could be another solution and maybe a more secure one too. |
That is the best solution. However, if it's the fedora image that Magnum depends on, we've been trying to update to use CoreOS (as that's what Magnum is going to) and it's just not working correctly. If you're using an Ubuntu image, creating an up to date snap on a regular basis is not a bad idea. |
Also, @zonca is this issue directly related to the ticket you opened today? |
@jlf599 yes, exactly |
Can you try an experiment of updating the image you're using with all of the latest updates and trying and see if you get a fast boot/cluster growth? |
I'm using |
If Julian is seeing lots of dpkg updates, that's an ubuntu/debian based image, though. You could yum update the Fedora image, snapshot it, and try using it. It should work but we all know how should messes with things. :) |
@julienchastang is referring to the kubespray deployment, with Magnum I have fedora, but it could be the same problem. Anyway it doesn't hurt to have a update image ;) |
That image for kubespray probably could stand to be updated, too, then. Can you let me know how things go so I can handle the ticket accordingly? |
See the log of an instance attached, I think there is a big delay well before updating packages console.txt |
@jlf599 even with an updated image I still see ~15 min provisioning of 1 node |
@jlf599 actually I realized I didn't actually update the image, I am retesting it again now. |
that was it, with my updated image |
my impression is that the older kernel was hanging on something, while the new one works fine. |
Nice ... I just tried it (looks like the image name is now Fedora-AtomicHost-28-Updated-9-6-2019 ) and it completed in a little less than 8 minutes ... way faster than with the last image. |
We are in the midst of creating an updated one -- you found it. :)
J
|
Okay. We've deactivated the old Atomic image and put this one on both clouds: 035d9554-086e-40f9-8da2-db023ea4b941 | Fedora-AtomicHost-28-Updated-9-6-2019 We'll be updating that every month or so with our featured images. Thanks for pointing out the issue! |
@jlf599 unfortunately it looks like the new image, both your version and mine, gets to "CREATE_COMPLETE", but the Kubernetes cluster is broken. For example in my current deployment, the master node, even if it seems to be running, it is not recognized by kubernetes:
It looks like there are some specific instructions on how to update the images for Magnum: https://docs.openstack.org/magnum/mitaka/dev/build-atomic-image.html Can you please remind me which version of openstack is on Jetstream at IU? Can you please recover the old image and I'll try to update it using those instructions? |
We're on Rocky release presently. Planning to go to Stein by year's end. We didn't delete the old image -- just made it inactive, so we can reenable it and see about getting it updated. |
I reactivated 5f2f28a4-6e7c-4515-86c7-f7cbfaa19a30 | Fedora-AtomicHost-28-20180625 and deactivated 035d9554-086e-40f9-8da2-db023ea4b941 | Fedora-AtomicHost-28-Updated-9-6-2019 I'm trying to find Rocky instructions like those above for Mitaka but haven't found them yet. Granted, I haven't spent much time on it yet. |
thanks @jlf599 for the prompt response!
|
I had the wrong hash, however that is still deactivated:
|
Hrm: 5f2f28a4-6e7c-4515-86c7-f7cbfaa19a30 | Fedora-AtomicHost-28-20180625 | active I did deactivate the one that wasn't working. Do you want that one back on? |
(openstack) [IU] [Entropy] jeremy ~-->os image set --activate 035d9554-086e-40f9-8da2-db023ea4b941 Back on |
ok, I tried to test with the older Fedora Atomic 27, and that worked fine! it doesn't have that slow boot as Fedora Atomic 28 and everything now works. |
@julienchastang @jlf599 @ktyle @rsignell-usgs See the tutorial: https://zonca.github.io/2019/09/kubernetes-jetstream-autoscaler.html I'll do more testing in the next weeks and improve the tutorial, but everything seems to be working fine. |
next simulate load, see #23 |
Cluster autoscaler is the official infrastructure to provide autoscaling on AWS and Google Cloud.
Openstack support was being developed.
The text was updated successfully, but these errors were encountered: