Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to install tobs OR timescaldb-single : FATAL: "/var/lib/postgresql/data" is not a valid data directory #106

Closed
inoam opened this issue Feb 13, 2021 · 7 comments

Comments

@inoam
Copy link

inoam commented Feb 13, 2021

Tried to install tobs on k8s. Followed the basic steps:

curl --proto '=https' --tlsv1.2 -sSLf  https://tsdb.co/install-tobs-sh |sh
tobs helm show-values > values.yaml

And then changed the storage class to match storage class I have in the cluster.
Then installed:

tobs install -f values.yaml

Helm installation finished.
However the tobs-timescaledb-0 pod keeps crushing for :

2021-02-13 18:48:09.610 GMT [80] LOG:  skipping missing configuration file "/var/run/postgresql/timescaledb.conf"
2021-02-13 18:48:09.611 GMT [80] LOG:  skipping missing configuration file "/var/run/postgresql/timescaledb.conf"
2021-02-13 18:48:09.611 GMT [80] LOG:  skipping missing configuration file "/var/lib/postgresql/data/postgresql.auto.conf"
2021-02-13 18:48:09.611 GMT [80] FATAL:  "/var/lib/postgresql/data" is not a valid data directory
2021-02-13 18:48:09.611 GMT [80] DETAIL:  File "/var/lib/postgresql/data/PG_VERSION" is missing.
running bootstrap script ... /var/run/postgresql:5432 - no response
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 138, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 100, in abstract_main
    controller.run()
  File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 108, in run
    super(Patroni, self).run()
  File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 59, in run
    self._run_cycle()
  File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 111, in _run_cycle
    logger.info(self.ha.run_cycle())
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1452, in run_cycle
    info = self._run_cycle()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1346, in _run_cycle
    return self.post_bootstrap()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1242, in post_bootstrap
    self.cancel_initialization()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1235, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'

I had later tried to install only timescaledb-single from the chart, and got the same failure.
I had modified the sts to sleep before running the init script on patroni, and I see directory /var/lib/postgresql/data is created by the pod command and exist.
However, indeed PG_VERSION does not exist there, not sure who shoold create it and when.

Full logs of timescaledb pod:

2021-02-13 18:47:56 - restore_or_initdb - Invoking initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
fixing permissions on existing directory /var/lib/postgresql/wal/pg_wal ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
Bus error (core dumped)
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
initdb: removing contents of WAL directory "/var/lib/postgresql/wal/pg_wal"
2021-02-13 18:48:09,364 WARNING: max_connections setting is missing from pg_controldata output
2021-02-13 18:48:09,364 WARNING: max_prepared_xacts setting is missing from pg_controldata output
2021-02-13 18:48:09,364 WARNING: max_locks_per_xact setting is missing from pg_controldata output
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=archive_command value=/etc/timescaledb/scripts/pgbackrest_archive.sh %p from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=archive_mode value=on from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=archive_timeout value=1800s from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=autovacuum_analyze_scale_factor value=0.02 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=autovacuum_max_workers value=10 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=autovacuum_vacuum_scale_factor value=0.05 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=cluster_name value=tobs from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=hot_standby value=on from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=listen_addresses value=0.0.0.0 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_autovacuum_min_duration value=0 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_checkpoints value=on from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_connections value=on from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_disconnections value=on from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_line_prefix value=%t [%p]: [%c-%l] %u@%d,app=%a [%e]  from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_lock_waits value=on from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_min_duration_statement value=1s from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=log_statement value=ddl from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=max_connections value=100 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=max_locks_per_transaction value=64 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=max_prepared_transactions value=150 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=max_replication_slots value=10 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=max_wal_senders value=10 from the config
2021-02-13 18:48:09,365 WARNING: Removing unexpected parameter=max_worker_processes value=8 from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=port value=5432 from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=shared_preload_libraries value=timescaledb,pg_stat_statements from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=ssl value=on from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=ssl_cert_file value=/etc/certificate/tls.crt from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=ssl_key_file value=/etc/certificate/tls.key from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=tcp_keepalives_idle value=900 from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=tcp_keepalives_interval value=100 from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=temp_file_limit value=1GB from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=track_commit_timestamp value=off from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=unix_socket_directories value=/var/run/postgresql from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=unix_socket_permissions value=0750 from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=wal_level value=hot_standby from the config
2021-02-13 18:48:09,366 WARNING: Removing unexpected parameter=wal_log_hints value=on from the config
2021-02-13 18:48:09.610 GMT [80] LOG:  skipping missing configuration file "/var/run/postgresql/timescaledb.conf"
2021-02-13 18:48:09.611 GMT [80] LOG:  skipping missing configuration file "/var/run/postgresql/timescaledb.conf"
2021-02-13 18:48:09.611 GMT [80] LOG:  skipping missing configuration file "/var/lib/postgresql/data/postgresql.auto.conf"
2021-02-13 18:48:09.611 GMT [80] FATAL:  "/var/lib/postgresql/data" is not a valid data directory
2021-02-13 18:48:09.611 GMT [80] DETAIL:  File "/var/lib/postgresql/data/PG_VERSION" is missing.
running bootstrap script ... /var/run/postgresql:5432 - no response
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 138, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 100, in abstract_main
    controller.run()
  File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 108, in run
    super(Patroni, self).run()
  File "/usr/lib/python3/dist-packages/patroni/daemon.py", line 59, in run
    self._run_cycle()
  File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 111, in _run_cycle
    logger.info(self.ha.run_cycle())
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1452, in run_cycle
    info = self._run_cycle()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1346, in _run_cycle
    return self.post_bootstrap()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1242, in post_bootstrap
    self.cancel_initialization()
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1235, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
@VineethReddy02
Copy link
Contributor

Hi @inoam

This looks like an error from the storage class you have configured as initial lines of the logs say bus error (core dumped) & followed by child process exited with exit code 135. I installed tobs couple of minutes back and the logs timescaleDB are as follows:

$ kubectl logs tobs-timescaledb-0 --namespace monitoring
2021-02-15 12:23:49 - restore_or_initdb - Invoking initdb
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok
fixing permissions on existing directory /var/lib/postgresql/wal/pg_wal ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

2021-02-15 12:23:50.048 GMT [44] LOG:  skipping missing configuration file "/var/run/postgresql/timescaledb.conf"
2021-02-15 12:23:50.049 GMT [44] LOG:  skipping missing configuration file "/var/run/postgresql/timescaledb.conf"
/var/run/postgresql:5432 - no response
2021-02-15 12:23:50 UTC [44]: [602a67d6.2c-3] @,app= [00000] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-02-15 12:23:50 UTC [44]: [602a67d6.2c-4] @,app= [00000] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-02-15 12:23:50 UTC [44]: [602a67d6.2c-5] @,app= [00000] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-02-15 12:23:50 UTC [46]: [602a67d6.2e-1] @,app= [00000] LOG:  database system was shut down at 2021-02-15 12:23:49 UTC
2021-02-15 12:23:50 UTC [44]: [602a67d6.2c-6] @,app= [00000] LOG:  database system is ready to accept connections
2021-02-15 12:23:50 UTC [53]: [602a67d6.35-1] @,app= [00000] LOG:  skipping missing configuration file "/var/run/postgresql/timescaledb.conf"
2021-02-15 12:23:50 UTC [53]: [602a67d6.35-2] @,app= [00000] LOG:  TimescaleDB background worker launcher connected to shared catalogs
2021-02-15 12:23:51 UTC [57]: [602a67d7.39-1] [unknown]@[unknown],app=[unknown] [00000] LOG:  connection received: host=[local]
2021-02-15 12:23:51 UTC [57]: [602a67d7.39-2] postgres@postgres,app=[unknown] [00000] LOG:  connection authorized: user=postgres database=postgres application_name=pg_isready
2021-02-15 12:23:51 UTC [57]: [602a67d7.39-3] postgres@postgres,app=pg_isready [00000] LOG:  disconnection: session time: 0:00:00.001 user=postgres database=postgres host=[local]
/var/run/postgresql:5432 - accepting connections
2021-02-15 12:23:51 UTC [59]: [602a67d7.3b-1] [unknown]@[unknown],app=[unknown] [00000] LOG:  connection received: host=[local]
2021-02-15 12:23:51 UTC [59]: [602a67d7.3b-2] postgres@postgres,app=[unknown] [00000] LOG:  connection authorized: user=postgres database=postgres application_name=pg_isready
/var/run/postgresql:5432 - accepting connections
2021-02-15 12:23:51 UTC [59]: [602a67d7.3b-3] postgres@postgres,app=pg_isready [00000] LOG:  disconnection: session time: 0:00:00.003 user=postgres database=postgres host=[local]
2021-02-15 12:23:51 UTC [60]: [602a67d7.3c-1] [unknown]@[unknown],app=[unknown] [00000] LOG:  connection received: host=[local]
2021-02-15 12:23:51 UTC [60]: [602a67d7.3c-2] postgres@postgres,app=[unknown] [00000] LOG:  connection authorized: user=postgres database=postgres application_name=Patroni
2021-02-15 12:23:51 - post_init - Creating extension TimescaleDB in template1 and postgres databases
2021-02-15 12:23:51 UTC [65]: [602a67d7.41-1] [unknown]@[unknown],app=[unknown] [00000] LOG:  connection received: host=[local]
2021-02-15 12:23:51 UTC [65]: [602a67d7.41-2] postgres@postgres,app=[unknown] [00000] LOG:  connection authorized: user=postgres database=postgres application_name=psql
2021-02-15 12:23:51 UTC [66]: [602a67d7.42-1] [unknown]@[unknown],app=[unknown] [00000] LOG:  connection received: host=[local]
2021-02-15 12:23:51 UTC [66]: [602a67d7.42-2] postgres@template1,app=[unknown] [00000] LOG:  connection authorized: user=postgres database=template1 application_name=psql
You are now connected to database "template1" as user "postgres".
2021-02-15 12:23:51 UTC [65]: [602a67d7.41-3] postgres@postgres,app=psql [00000] LOG:  disconnection: session time: 0:00:00.008 user=postgres database=postgres host=[local]
SET
2021-02-15 12:23:51 UTC [66]: [602a67d7.42-3] postgres@template1,app=psql [00000] LOG:  statement: CREATE EXTENSION timescaledb;
2021-02-15 12:23:51 UTC [66]: [602a67d7.42-4] postgres@template1,app=psql [01000] WARNING:  
	WELCOME TO
	 _____ _                               _     ____________  
	|_   _(_)                             | |    |  _  \ ___ \ 
	  | |  _ _ __ ___   ___  ___  ___ __ _| | ___| | | | |_/ / 
	  | | | |  _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \ 
	  | | | | | | | | |  __/\__ \ (_| (_| | |  __/ |/ /| |_/ /
	  |_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
	               Running version 1.7.4
	For more information on TimescaleDB, please visit the following links:

Do you mind elaborating us on your setup i.e. k8s platform, storage type and steps on how you are configuring storage class in tobs values.yaml?

@inoam
Copy link
Author

inoam commented Feb 15, 2021

Hi @VineethReddy02 !
The bus error (core dumped) indeed led to the error.
The issue was due to hugepages: machine had hugepages turned on, hence postgres must request it on the pod spec, as found here:
zalando/patroni#1393
And here:
kubernetes/kubernetes#71233

BTW do you know where could I find this core-dump? it was not on the machine nor on the container (I ran the patroni from inside the container without exec, so it would be kept up ; still no core-dump at /var/crash)

@VineethReddy02
Copy link
Contributor

Hi @inoam
I am not exactly sure where exactly the core-dump is placed. Did you manage to find the solution? I see some workarounds mentioned here

@inoam
Copy link
Author

inoam commented Feb 15, 2021

Yes, issue solved when applying the following to values.yaml (I guess numbers could be much lower, but did not check):

timescaledb-single:
  resources:
    limits:
      memory: 2Gi
      hugepages-2Mi: 128Mi

@VineethReddy02
Copy link
Contributor

so @inoam can we close this issue?

@inoam inoam closed this as completed Feb 16, 2021
@inoam
Copy link
Author

inoam commented Feb 16, 2021

@Davincible
Copy link
Contributor

How do you disable them for timescaledb? The fixes in the other issues don't quite work with the timescale helm chart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants