Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hetzner Target Unhealthy #104

Closed
nelsonic opened this issue Mar 20, 2025 · 17 comments
Closed

Hetzner Target Unhealthy #104

nelsonic opened this issue Mar 20, 2025 · 17 comments
Assignees

Comments

@nelsonic
Copy link
Member

2/3 of the Postgres Servers in the Autobase cluster are unhealthy:

Image Image

The Hetzner interface is not very good at clarifying why they are "unhealthy" or what that even means ... 🤷‍♂

So we need to investigate by logging into the machines (SSH) and figuring this out. 🔍

@nelsonic
Copy link
Member Author

Continue: https://console.hetzner.cloud/projects/3886115/servers/57770120/overview

ssh root@88.99.81.115

@nelsonic nelsonic moved this from More ToDo ThanCanEver Be Done to In progress in Nelson's List Mar 21, 2025
@nelsonic
Copy link
Member Author

*** System restart required ***

https://askubuntu.com/questions/258297/should-i-always-restart-the-system-when-i-see-system-restart-required

sudo reboot
Broadcast message from root@ubuntu-4gb-fsn1-2-autobase-console on pts/1 (Fri 2025-03-21 05:54:50 UTC):

The system will reboot now!

@nelsonic
Copy link
Member Author

114 updates can be applied immediately.
1 of these updates is a standard security update.

https://askubuntu.com/questions/449032/29-packages-can-be-updated-how

sudo apt-get update
sudo apt-get upgrade

Took a few minutes. ⏳
Everything up-to-date now. ✅

0 updates can be applied immediately.

@nelsonic
Copy link
Member Author

Now I'm going to SSH into each of the Nodes in the cluster and run the same updates.

@nelsonic
Copy link
Member Author

On the first node in the cluster, executed:

sudo apt-get upgrade

Got:

Setting up pgbouncer (1.24.0-3.pgdg24.04+1) ...

Configuration file '/etc/pgbouncer/pgbouncer.ini'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** pgbouncer.ini (Y/I/N/O/D/Z) [default=N] ?

Chose N to keep the currently-installed version.

Then:

sudo reboot

Did this for all 3 nodes in the cluster. ✅

@nelsonic
Copy link
Member Author

Sadly, that didn't do anything to improve the "health":

Image Image

@nelsonic
Copy link
Member Author

Decided to DELETE the postgres-cluster-01-replica (Replica Load Balancer) because it's not needed:

Image

@nelsonic
Copy link
Member Author

So now we only have the one Load Balancer:
https://console.hetzner.cloud/projects/3886115/loadbalancers/2226998/overview

Image

No progress on the "health" of the 2/3 targets. ⏳

@nelsonic
Copy link
Member Author

I've done a bit of googling: "hetzner load balancer targets unhealthy" etc. 🔍
But none of the Reddit pages were helpful. 🙅

Reading: https://gitlab.com/postgres-ai/postgres-checkup

@nelsonic
Copy link
Member Author

Better reading than the README of the project: https://postgres.ai/docs/checkup

@nelsonic
Copy link
Member Author

Image

PDF snapshot: Overview-postgres-checkup_Postgres.AI.pdf

@nelsonic
Copy link
Member Author

Viewing the Autobase dashboard:

Image

Status is "healthy" ...

And viewing the cluster there is no additional info or reason for concern:

Image

@nelsonic
Copy link
Member Author

note: yes, it bugs me too that the Autobase console is http not https (i.e. insecure) by default ... 😕

Image

Attempting to view the page with https fails:

Image

I feel like this needs to be addressed as P1 because MITM is inevitable.

@nelsonic
Copy link
Member Author

Feel like I need to address the insecure (by default) Autobase console issue as a priority ... 🔥 🧑‍🚒
If anyone between me and the server (quite a few hops) intercepts the request they can use it to gain access to the Hetzner API keys and spin up servers for any nefarious purpose ... 😕

@nelsonic
Copy link
Member Author

Side-quest: #105

@nelsonic
Copy link
Member Author

A couple of days later after updates & reboots of the servers and nothing has changed on the "health" front:

Image

I will continue investigating. 🔍

@nelsonic
Copy link
Member Author

autobase reports the cluster as healthy:
https://autobase.dwy.is/clusters
Image

Closing as there's nothing else I can do on this. 🙃

@github-project-automation github-project-automation bot moved this from In progress to Done in Nelson's List Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant