tls transport: detect certificate expiration & auto-reload keypair on disk #202

jbliesener · 2019-07-22T14:01:29Z

zrepl is running pretty successfully for some time now, replicating two production systems into one backup server. It's a pull/source TLS configuration in which the backup server calls into the production system through TLS.

Today, I ran into an unexpected problem when the backup server complained about an expired TLS certificate on one of the production systems.

When I checked the configuration, I noticed that the Letsencrypt TLS certificate on the production system had expired a month ago, but it already had been successfully renewed through a cron-scheduled task.

So, while the certificate was actually valid, it seems like zrepl continued to use the former expired certificate, that had been deleted from the file system already.

Of course I could include a zrepl restart into my certificate renewal procedure. On the other hand, I find it valid to either document this behavior or to reload the certificate.

For sink jobs, this could happen at the start of the outgoing connection, for source jobs, it could be at the end of an incoming connection.

jbliesener · 2019-07-22T14:03:45Z

I forgot to mention: a simple systemctl restart zrepl on the production system brought it back to work.

problame · 2019-07-22T20:56:52Z

I triaged the issue, but it's pretty low on my priority list.
Implementation concerns:

Figure out how to merge the clients from the old server (old keypair) and new server (new keypair) for both gRPC and dataconn
- Alternative: job-level hard restart
Only try reloading if the certificate is about to expire (<10s) or already expired
Be pessimistic: we could race with the certificate renewal and read a new key file + old cert file or vice versa
- retry a few times with 1s gaps inbetween

jbliesener · 2019-07-22T21:18:38Z

Wouldn't it be sufficient to check certificate validity BEFORE every outgoing connection and AFTER every incoming connection?

No need to merge anything as long as you run from a single CA. If the CA certificate expires (pretty rare condition), you'd most probably need to restart anyway as it would affect more than just zrepl.

The race condition would be temporary as well and solve automatically on the next retry.

And, yes, it's a low-priority. Just wanted to mention it.

problame changed the title ~~TLS certificate seems to be loaded statically~~ tls transport: detect certificate expiration & updated keypair on disk Jul 22, 2019

problame added feature good first issue labels Jul 22, 2019

problame added this to To do in Usability & UX via automation Jul 22, 2019

problame changed the title ~~tls transport: detect certificate expiration & updated keypair on disk~~ tls transport: detect certificate expiration & auto-reload keypair on disk Jul 22, 2019

problame added the data_model label Jan 17, 2020

problame mentioned this issue Oct 4, 2020

Reload Config without downtime #264

Open

problame mentioned this issue May 14, 2023

Documentation for how to use TLS connection type without self-signed certs #696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tls transport: detect certificate expiration & auto-reload keypair on disk #202

tls transport: detect certificate expiration & auto-reload keypair on disk #202

jbliesener commented Jul 22, 2019

jbliesener commented Jul 22, 2019 •

edited

problame commented Jul 22, 2019

jbliesener commented Jul 22, 2019

tls transport: detect certificate expiration & auto-reload keypair on disk #202

tls transport: detect certificate expiration & auto-reload keypair on disk #202

Comments

jbliesener commented Jul 22, 2019

jbliesener commented Jul 22, 2019 • edited

problame commented Jul 22, 2019

jbliesener commented Jul 22, 2019

jbliesener commented Jul 22, 2019 •

edited