spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk? #172

joar · 2017-06-27T12:58:23Z

Scenario

GKE Kubernetes

spilo Pods via StatefulSet: patroni-set-0003

kind: StatefulSet
# [...]
metadata:
  name: patroni-set-0003
spec:
  replicas: 3
  # [...]
  template:
    spec:
      containers:
        - name: spilo
          # [...]
          env:
            - name: SCOPE
              value: the-scope
          volumeMounts:
            - mountPath: /home/postgres/pgdata
              name: pgdata
  volumeClaimTemplates:
    - metadata:
        name: pgdata
        
      spec:
        # [...]
        resources.requests.storage: 500Gi

Unfortunately, /home/postgres/pgdata ran out of space (in all pods, it seems,
probably almost simultaneously) and spilo/patroni started logging:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/patroni/async_executor.py", line 39, in run
    wakeup = func(*args) if args else func()
  File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 1067, in _do_follow
    self.write_recovery_conf(primary_conninfo)
  File "/usr/local/lib/python3.5/dist-packages/patroni/postgresql.py", line 911, in write_recovery_conf
    f.write("{0} = '{1}'\n".format(name, value))
OSError: [Errno 28] No space left on device

I believe the last leader before all pods went out of disk was
either patroni-set-0003-1 or patroni-set-0003-2.

Recovery

In order to solve the issue I:

Scaled down patroni-set-0003 to 1 replica (still failing with OSError: No
space left on device),
Note that this will leave me without any running old
leader, broken or not, I believe this could be a key to my issue.

Created a new StatefulSet, patroni-set-0004, with the same configuration as
patroni-set-0003 except

spec.metadata.name: patroni-set-0004
spec.replicas: 1
spec.volumeClaimTemplates[0].spec.resources.requests: 1Ti

With only the broken patroni-set-0003-0 running, patroni-set-0004-0 started
restoring from WAL archive, I left it overnight to restore. During this time
both patroni-set-0003-0 and patroni-set-0004-0 were running, but patroni-set-0003-0` was out of disk.

Several hours later, patroni-set-0004-0 was logging lots of:

following a different leader because i am not the healthiest node
Lock owner: None; I am patroni-set-0004-0
wal_e.blobstore.gs.utils WARNING MSG: could no longer locate object while performing wal restore
DETAIL: The absolute URI that could not be located is gs://the-bucket/spilo/the-scope/wal/wal_005/the-file.lzo.
HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
STRUCTURED: time=2017-06-26T12:05:23.646236-00 pid=207
lzop: <stdin>: not a lzop file
[...]

I expected patroni-set-0004-0 to take over the master lock by this time.

Debugging why the disk outage occured, I found out about ext Reserved
blocks, I then
recovered 25Gi of disk space on patroni-set-0003-0's pgdata by running
tune2fs -m 0 /dev/$PGDATA_DEV. I realize in hindsight that simply resizing
the GCE PD would have been easier.

However, once patroni-set-0003-0 was given extra space and restarted, it did
not seem willing to take the leader role even given the extra disk space and no
current leader, logging lots of:

Lock owner: None; I am patroni-set-0003-0
wal_e.blobstore.gs.utils WARNING MSG: could no longer locate object while performing wal restore
DETAIL: The absolute URI that could not be located is gs://the-bucket/spilo/the-scope/wal/wal_005/the-file.lzo.
HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
STRUCTURED: time=2017-06-26T12:05:23.646236-00 pid=207
lzop: <stdin>: not a lzop file
[...]

I expected patroni-set-0003-0 to take the leader role by this time.

I then did the same thing to patroni-set-0003-{1,2}, freeing up 25Gi of space.

Once patroni-set-0003-1 was given extra disk space and restarted it took the
master lock.

The text was updated successfully, but these errors were encountered:

joar changed the title ~~spilo/patroni not able to elect new leader if existing leader failed due to full disk?~~ spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk? Jun 27, 2017

hughcapet closed this as completed Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk? #172

spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk? #172

joar commented Jun 27, 2017

spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk? #172

spilo/patroni not able to elect new leader if previous leader, last working member failed due to full disk? #172

Comments

joar commented Jun 27, 2017

Scenario

Recovery