Skip to content

Let's Encrypt: add-water control socket (127.0.0.1:9977) lacks SO_REUSEADDR — back-to-back runs fail with "Connection refused" #2121

@marcos-mendez

Description

@marcos-mendez

Component

confconsole Let's Encrypt plugin — plugins.d/Lets_Encrypt/add-water-srv

Environment

  • Appliance: turnkey-odoo-18.0-bookworm-amd64
  • confconsole: 2.1.6.1
  • dehydrated: 0.7.0-3
  • python3: 3.11.2

Summary

The internal control socket used by add-water-srv binds to 127.0.0.1:9977
without setting SO_REUSEADDR. If add-water is stopped and started again
within the TCP TIME_WAIT window (~60s), the bind() fails with
OSError: [Errno 98] Address already in use. The handle_token_input thread
dies, so the Bottle server on port 80 comes up but the control channel is gone.
dehydrated then calls the deploy_challenge hook, add-water-client tries to
connect to 127.0.0.1:9977, and gets ConnectionRefusedError: [Errno 111] Connection refused. The whole certificate run aborts.

Root cause

In add-water-srv, handle_token_input():

def handle_token_input():
    host = '127.0.0.1'
    port = 9977
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind((host, port))   # no SO_REUSEADDR -> fails while previous socket is in TIME_WAIT
    sock.listen(1)

There is no SO_REUSEADDR (confirmed: grep -rn "SO_REUSEADDR\|setsockopt" over
the plugin directory returns nothing).

Steps to reproduce

1. Run the wrapper successfully once:
/usr/lib/confconsole/plugins.d/Lets_Encrypt/dehydrated-wrapper --register --force --log-info
2. Immediately run it again (within ~60s).
3. The second run fails.

Actual output (second run)

[...] dehydrated-wrapper: INFO: stopping apache2
[...] dehydrated-wrapper: INFO: running dehydrated
 + Deploying challenge tokens...
[...] confconsole.hook.sh: INFO: Deploying challenge for <domain>
Traceback (most recent call last):
  File ".../add-water-client", line 41, in <module>
    sock.connect((host, port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR: deploy_challenge hook returned with non-zero exit code
[...] dehydrated-wrapper: FATAL: dehydrated exited with a non-zero exit code.

And in journalctl -u add-water:
Bottle v0.12.23 server starting up (using WSGIRefServer())...
Listening on http://:::80/
Exception in thread Thread-1 (handle_token_input):
  File ".../add-water-srv", line 87, in handle_token_input
    sock.bind((host, port))
OSError: [Errno 98] Address already in use

Expected behavior

Re-running the wrapper shortly after a previous run should succeed (e.g. retry
after a transient failure) instead of aborting on a lingering control socket.

Proposed fix

Set SO_REUSEADDR before binding the control socket in handle_token_input():

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)  # add this
sock.bind((host, port))
sock.listen(1)

Notes / impact

- This does NOT affect the daily renewal cron (it only runs once per day, so it
never hits the TIME_WAIT window). It bites operators who retry manually after a
failed/aborted run, or who test against staging and then immediately switch to
production.
- Possibly related minor quirk: dehydrated-wrapper logs
WARNING: Python is still listening on port 80 on successful runs too, then
force-stops add-water. Might be worth tightening the shutdown ordering so the
warning only appears when something is genuinely wrong.

---

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions