-
Couldn't load subscription status.
- Fork 1k
Description
Hi, I have configure WAL with the following. This allows me to successfully create my basebackup WAL on my object storage just fine.
for example on DB initialisation I happily see the following object sent to my object storage.
spilo/foo-cluster/fdgdffg-dfgdfg-dfgdg-8fddfg8dc-dfgdfg/wal/basebackups_005/base_000000010000000000000002/tar_partitions/part_001.tar.lz4
with the following config
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-pod-config
namespace: postgres-cluster
data:
BACKUP_SCHEDULE: "0 */12 * * *"
USE_WALG_BACKUP: "true"
BACKUP_NUM_TO_RETAIN: "14"
AWS_ACCESS_KEY_ID: xxxxx
AWS_SECRET_ACCESS_KEY: xxxxxx
AWS_ENDPOINT: https://sdfdsfdfgdfg.compat.objectstorage.us-east-1.oraclecloud.com
AWS_S3_FORCE_PATH_STYLE: "true"
AWS_REGION: us-east-1
however after a while my container begins to OOM the wal-g process
2090157.723139] Memory cgroup stats for /kubepods/podd31a1ea6-7005-41d2-9e00-a78a08402cf9/c39b84616fedb31df69edb62c423677a82d25bd74ca590b186c9807e610f6cdf: cache:20192KB rss:469716KB rss_huge:360448KB shmem:19932KB mapped_file:16744KB dirty:0KB writeback:0KB swap:0KB inactive_anon:6996KB active_anon:482652KB inactive_file:136KB active_file:124KB unevictable:0KB
[2090157.723148] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[2090157.723275] [24301] 0 24301 1098 193 8 3 0 -998 dumb-init
[2090157.723277] [24360] 0 24360 1159 459 8 3 0 -998 sh
[2090157.723279] [24499] 0 24499 12738 862 29 3 0 -998 su
[2090157.723281] [24500] 0 24500 1140 213 8 3 0 -998 runsvdir
[2090157.723283] [24501] 0 24501 1102 206 7 3 0 -998 runsv
[2090157.723284] [24502] 0 24502 1102 213 7 3 0 -998 runsv
[2090157.723286] [24503] 0 24503 1102 195 7 3 0 -998 runsv
[2090157.723288] [24504] 101 24504 173946 10175 101 3 0 -998 patroni
[2090157.723290] [24505] 0 24505 7089 705 19 3 0 -998 cron
[2090157.723292] [24506] 101 24506 27000 2014 52 3 0 -998 pgqd
[2090157.723294] [24510] 101 24510 279237 10805 86 5 0 -998 wal-g
[2090157.723296] [24708] 101 24708 78672 7358 88 3 0 -998 postgres
[2090157.723297] [24711] 101 24711 48841 1126 74 3 0 -998 postgres
[2090157.723299] [24713] 101 24713 99430 3186 86 3 0 -998 postgres
[2090157.723301] [24924] 101 24924 78705 3019 85 3 0 -998 postgres
[2090157.723303] [24925] 101 24925 78709 1618 81 3 0 -998 postgres
[2090157.723305] [24926] 101 24926 49436 1303 75 3 0 -998 postgres
[2090157.723307] [25076] 101 25076 79011 4159 87 3 0 -998 postgres
[2090157.723309] [25309] 101 25309 78672 2149 78 3 0 -998 postgres
[2090157.723311] [25310] 101 25310 78853 2123 80 3 0 -998 postgres
[2090157.723313] [25311] 101 25311 49371 1684 76 3 0 -998 postgres
[2090157.723315] [25312] 101 25312 78935 3006 84 3 0 -998 postgres
[2090157.723316] [25314] 101 25314 78813 2065 83 3 0 -998 postgres
[2090157.723318] [25315] 101 25315 78812 1723 79 3 0 -998 postgres
[2090157.723320] [26175] 101 26175 78977 2946 83 3 0 -998 postgres
[2090157.723322] [28384] 101 28384 79065 4629 88 3 0 -998 postgres
[2090157.723330] [31757] 0 31757 11288 700 27 3 0 -998 cron
[2090157.723332] [31758] 101 31758 1157 215 7 3 0 -998 sh
[2090157.723334] [31759] 101 31759 242369 17043 78 5 0 -998 wal-g
[2090157.723336] [31809] 101 31809 79065 4801 88 3 0 -998 postgres
[2090157.723338] [31834] 101 31834 1157 206 9 3 0 -998 sh
[2090157.723340] [31835] 101 31835 382233 95774 240 5 0 -998 wal-g
[2090157.723342] Memory cgroup out of memory: Kill process 24301 (dumb-init) score 0 or sacrifice child
[2090157.764125] Killed process 24360 (sh) total-vm:4636kB, anon-rss:116kB, file-rss:1720kB, shmem-rss:0kB
[2090413.393720] wal-g invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=-998
[2090413.393724] wal-g cpuset=a3937f08578a7c0522de5a73245347bf990a4fe182e0a424670fdedb1787ed61 mems_allowed=0
When I look closer it seems that after the initial basebackup no more wal's are sent eventually filling up my disk. Looking at the postgres logs they are unable to send any more logs after the initial basebackup.
failed to upload 'spilo/foo-cluster/d210db5f-e8f3-4807-9359-4b8df275df6f/wal/wal_005/00000009.history.lz4' to bucket 'postgres-foo-wal-backup': SignatureDoesNotMatch: The required information to complete authentication was not provided.
status code: 403, request id: east-1:sfsdfgd231DIdfdgOIUX-_Q, host id:
This seems to be an issue on my side rather than the operator and how I authenticate with my cloud provider but I do not understand how the initial baseback up works but subsequent backups dont when they all use the same config. Can you give me any tips when it comes to debugging this?
Thanks!