Upgrading from PG 11 to 12 does not work using pg_upgrade #1844

pehlert · 2020-04-22T16:35:26Z

Hi all,

I tried to upgrade from PG 11 to PG 12.

What I did was:

upgrade extension to 1.7.0 on PG11
shutdown database
Initialize new PG12 cluster
Run pg_upgrade -O "-c timescaledb.restoring='on'" (data dirs are set using ENV vars)

What it returns:

[..] (some output ommitted, just creates like below
pg_restore: creating FUNCTION "public.time_bucket_gapfill(interval, timestamp without time zone, timestamp without time zone, timestamp without time zone)"
pg_restore: creating FUNCTION "public.time_bucket_gapfill(interval, timestamp with time zone, timestamp with time zone, timestamp with time zone)"
pg_restore: creating FUNCTION "public.timescaledb_post_restore()"
pg_restore: creating FUNCTION "public.timescaledb_pre_restore()"
pg_restore: creating FUNCTION "public.update_current_machine_status()"
pg_restore: creating FUNCTION "public.update_current_order_stats()"
pg_restore: creating AGGREGATE "_timescaledb_internal.finalize_agg("text", "name", "name", "name"[], "bytea", "anyelement")"
pg_restore: creating AGGREGATE "public.first("anyelement", "any")"
pg_restore: creating AGGREGATE "public.histogram(double precision, double precision, double precision, integer)"
pg_restore: creating AGGREGATE "public.last("anyelement", "any")"
pg_restore: creating TABLE "_timescaledb_cache.cache_inval_bgw_job"
pg_restore: creating TABLE "_timescaledb_cache.cache_inval_extension"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 234; 1259 17367 TABLE cache_inval_extension mydb
pg_restore: error: could not execute query: ERROR:  OID lookup failed for table index "hypertable_pkey"
Command was:
-- For binary upgrade, must preserve pg_type oid
SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('17369'::pg_catalog.oid);


-- For binary upgrade, must preserve pg_type array oid
SELECT pg_catalog.binary_upgrade_set_next_array_pg_type_oid('17368'::pg_catalog.oid);


-- For binary upgrade, must preserve pg_class oids
SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('17367'::pg_catalog.oid);

CREATE TABLE "_timescaledb_cache"."cache_inval_extension" (
);

-- For binary upgrade, set heap's relfrozenxid and relminmxid
UPDATE pg_catalog.pg_class
SET relfrozenxid = '572', relminmxid = '1'
WHERE oid = '"_timescaledb_cache"."cache_inval_extension"'::pg_catalog.regclass;

-- For binary upgrade, handle extension membership the hard way
ALTER EXTENSION "timescaledb" ADD TABLE "_timescaledb_cache"."cache_inval_extension";

The text was updated successfully, but these errors were encountered:

trnubo · 2020-04-23T00:45:51Z

I've hit basically the same error doing a pg_upgrade from PG10 to PG11 with TimescaleDB 1.6.1

pg_restore: creating TABLE "_timescaledb_cache.cache_inval_bgw_job"
pg_restore: creating TABLE "_timescaledb_cache.cache_inval_extension"
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 224; 1259 53878 TABLE cache_inval_extension mydb
pg_restore: [archiver (db)] could not execute query: ERROR:  OID lookup failed for table index "hypertable_pkey"
    Command was: 
-- For binary upgrade, must preserve pg_type oid
SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('53880'::pg_catalog.oid);


-- For binary upgrade, must preserve pg_type array oid
SELECT pg_catalog.binary_upgrade_set_next_array_pg_type_oid('53879'::pg_catalog.oid);


-- For binary upgrade, must preserve pg_class oids
SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('53878'::pg_catalog.oid);

CREATE TABLE "_timescaledb_cache"."cache_inval_extension" (
);

-- For binary upgrade, set heap's relfrozenxid and relminmxid
UPDATE pg_catalog.pg_class
SET relfrozenxid = '600033552', relminmxid = '1'
WHERE oid = '"_timescaledb_cache"."cache_inval_extension"'::pg_catalog.regclass;

-- For binary upgrade, handle extension membership the hard way
ALTER EXTENSION "timescaledb" ADD TABLE "_timescaledb_cache"."cache_inval_extension";

trnubo · 2020-04-23T06:13:15Z

I managed to work past my issue by using SELECT timescaledb_pre_restore(); before the pg_upgrade and SELECT timescaledb_post_restore(); after. I'm not sure why this worked, I used -O "-c timescaledb.restoring='on'" during the failed upgrade and there isn't any direct mention of using those restore functions during as PG upgrade.

pehlert · 2020-04-23T08:27:57Z

I had the same suspicion that the timescaledb.restoring flag might be broken somehow. With the way our dev environments are set up, a manual dump and restore would be much more difficult to achieve. I hope we can get this fixed somehow

mfreed · 2020-04-23T16:04:38Z

Folks, thanks for this back-and-forth. We are investigating, but keep the insights and observations coming!

mkindahl · 2020-04-29T13:37:36Z

Looking through the code and trying multiple ways to reproduce it I still do not see why the upgrade as described above fails. The procedure I did was as described above, but repeating it below with details about what I did so that others can check.

Summary

To run an upgrade, it is necessary to

Set timescaledb.restoring when starting the server, which is done by passing -O "-c timescaledb.restoring=on to pg_upgrade
Add timescaledb to shared_preload_libraries in the configuration file. At minimum for the new server, but this should typically be done for both servers.

The procedures timescaledb_pre_restore and timescaledb_post_restore set (and clear) the variable timescaledb.restoring and also stop (and start) the background worker threads. Apart from that it does not do anything special. The only explanation for why adding them above removes the error would be that the timescaledb.restoring variable for some reason is not set in the execution of pg_upgrade above.

The binary pg_upgrade does, as part of one of the steps, copy the schema from the old server to the new server. If timescaledb.restoring is not set to on on the new server, then the error message above will be printed when the cache invalidation is done. This is done as part of executing the first statement on the new server. The actual error message is printed from inside ts_catalog_table_info_init when trying to create all the TimescaleDB tables, which is done as part of the cache invalidation procedure and only happen the first time the server is started with the extension.

I manage to get the error message (ERROR: OID lookup failed for table index "hypertable_pkey", to be exact) when running without timescaledb.restoring set to on, but only then. Adding the option removed the message. Investigating the code, I do not see any other path in the code that could trigger this error message by directly or indirectly calling this function.

Reproduction steps

What I did was:

Set up the PG11 server:

Create a database with version PG 11.6 (this is what I had locally and I have no reason to believe that 11.7 would be any different).
Add timescaledb to shared_preload_libraries in the configuration file.
Start the server.
Populate it with the NYC data from Tutorial: Hello Timescale!
Add a continuous aggregate for good measure (note that the background workers were involved above).
Stop the server.

Create a PG12 server:

Create a server with version PG 12.1 (this is what I had locally, no reason that 12.2 would behave differently).
Add timescaledb to shared_preload_libraries in the configuration file.

Do the upgrade:

Set environment variables (see below).
Run $PGBINNEW/pg_upgrade -O "-c timescaledb.restoring=on"

Environment variables set are:

# Binary directories with local install
oldroot=/usr/local/pgsql/11.6
newroot=/usr/local/pgsql/12.1

# Configure directories and other information for old and new server
export PGBINOLD=$oldroot/bin
export PGBINNEW=$newroot/bin
export PGDATAOLD=$HOME/temp/postgres/data-node-11
export PGDATANEW=$HOME/temp/postgres/data-node-12
export PGUSER=postgres
export PGPORTOLD=`get_port $PGDATAOLD/postgresql.conf`
export PGPORTNEW=`get_port $PGDATANEW/postgresql.conf`

(Function get_port is a utility function that extract the port from the configuration file.)

pehlert · 2020-04-29T18:10:10Z

Thank you so much @mkindahl for the thorough investigation. I wish I could add more to this, but at least I verified in pg_upgrade's pg_upgrade_server.log that the servers are seemingly started correct with the right flag set:

command: "/usr/lib/postgresql/12/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/postgresql/data/12" -o "-p 50432 -b -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c timescaledb.restoring='on' -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directories='/var/lib/postgresql'" start >> "pg_upgrade_server.log" 2>&1
waiting for server to start....2020-04-22 16:21:34.765 UTC [1549] LOG:  starting PostgreSQL 12.2 (Debian 12.2-2.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2020-04-22 16:21:34.765 UTC [1549] LOG:  listening on Unix socket "/var/lib/postgresql/.s.PGSQL.50432"
2020-04-22 16:21:34.778 UTC [1550] LOG:  database system was shut down at 2020-04-22 16:19:56 UTC
2020-04-22 16:21:34.783 UTC [1549] LOG:  database system is ready to accept connections
2020-04-22 16:21:34.796 UTC [1555] LOG:  TimescaleDB background worker launcher connected to shared catalogs
 done
server started


command: "/usr/lib/postgresql/12/bin/pg_ctl" -w -D "/var/lib/postgresql/data/12" -o "-c timescaledb.restoring='on'" -m smart stop >> "pg_upgrade_server.log" 2>&1
2020-04-22 16:21:35.206 UTC [1549] LOG:  received smart shutdown request
2020-04-22 16:21:35.206 UTC [1555] LOG:  terminating TimescaleDB background worker launcher due to administrator command
waiting for server to shut down...2020-04-22 16:21:35.206 UTC [1555] FATAL:  terminating connection due to administrator command
.2020-04-22 16:21:35.208 UTC [1549] LOG:  background worker "TimescaleDB Background Worker Launcher" (PID 1555) exited with exit code 1
2020-04-22 16:21:35.209 UTC [1549] LOG:  background worker "logical replication launcher" (PID 1556) exited with exit code 1
2020-04-22 16:21:35.209 UTC [1551] LOG:  shutting down
2020-04-22 16:21:35.217 UTC [1549] LOG:  database system is shut down
 done
server stopped

Let me know if there is anything else I can check.

pehlert · 2020-04-29T18:19:37Z

One more thing I tried without luck was to set timescaledb.restoring = 'on' in the actual postgresql.conf file for the PG 12 server

mkindahl · 2020-05-05T09:01:46Z

@pehlert Can you turn on statement logging (log_statement = 'all'), run a pg_restore, and check if there are any lines like this in the restore log:

2020-05-05 09:18:29.533 CEST [3454] STATEMENT:  ALTER DATABASE "defaultdb" SET "search_path" TO '$user', 'public', 'topology', 'tiger';
        ALTER DATABASE "defaultdb" SET "timescaledb.restoring" TO 'off';
        
        -- For binary upgrade, set datfrozenxid and datminmxid.
        UPDATE pg_catalog.pg_database
        SET datfrozenxid = '548', datminmxid = '1'
        WHERE datname = 'defaultdb';

After digging through the code, it seems like database-specific settings are dumped with pg_dump and will take precedence over both command-line settings and configuration file settings.

If timescaledb.restoring is for some reason set for the database (not sure why this happens in my case, but it could be from a client connection or something that is done internally), it will take effect rather than your command-line version because it is executed after the command-line options are installed.

pehlert · 2020-05-05T21:27:33Z

Hi @mkindahl , again thank you so much for taking care of this and the brilliant investigation.

Your assumption is spot on, this is the relevant part of the log:

2020-05-05 21:21:58.338 UTC [340] FATAL:  database "template0" is not currently accepting connections
2020-05-05 21:21:58.340 UTC [313] LOG:  background worker "TimescaleDB Background Worker Scheduler" (PID 340) exited with exit code 1
2020-05-05 21:21:58.341 UTC [339] LOG:  statement: SELECT pg_catalog.pg_is_in_recovery()
2020-05-05 21:21:58.342 UTC [341] LOG:  statement: SELECT pg_catalog.set_config('search_path', '', false);
2020-05-05 21:21:58.342 UTC [341] LOG:  statement: SET statement_timeout = 0;

2020-05-05 21:21:58.342 UTC [341] LOG:  statement: SET lock_timeout = 0;

2020-05-05 21:21:58.342 UTC [341] LOG:  statement: SET idle_in_transaction_session_timeout = 0;

2020-05-05 21:21:58.342 UTC [341] LOG:  statement: SET client_encoding = 'UTF8';

2020-05-05 21:21:58.342 UTC [341] LOG:  statement: SET standard_conforming_strings = on;

2020-05-05 21:21:58.342 UTC [341] LOG:  statement: SELECT pg_catalog.set_config('search_path', '', false);

2020-05-05 21:21:58.344 UTC [341] LOG:  statement: SET check_function_bodies = false;

2020-05-05 21:21:58.344 UTC [341] LOG:  statement: SET xmloption = content;

2020-05-05 21:21:58.345 UTC [341] LOG:  statement: SET client_min_messages = warning;

2020-05-05 21:21:58.345 UTC [341] LOG:  statement: SET row_security = off;

2020-05-05 21:21:58.345 UTC [341] LOG:  statement:

2020-05-05 21:21:58.345 UTC [341] LOG:  statement: ALTER DATABASE "mydb" SET "timescaledb.restoring" TO 'off';

        -- For binary upgrade, set datfrozenxid and datminmxid.
        UPDATE pg_catalog.pg_database
        SET datfrozenxid = '562', datminmxid = '1'
        WHERE datname = 'mydb';



2020-05-05 21:21:58.348 UTC [341] LOG:  statement: SELECT pg_catalog.pg_is_in_recovery()

I can also confirm that a ALTER DATABASE mydb RESET "timescaledb.restoring"' on the old database fixes the issue! A bit cumbersome to automate, but that workaround will do it for us.

mkindahl · 2020-05-06T08:55:11Z

@pehlert Thank you for testing this, it confirmed my suspicion and we have a patch that fixes this specific case of the issue.

However, given how option processing works, it is not a very solid approach to rely on timescaledb.restoring to be set correctly in all cases, so we need some other means to detect that we are restoring a database that does not require the user to set the variable.

If a binary upgrade is in progress (when using `pg_upgrade`) the per-database setting of `timescaledb.restoring` can be included in the dump, which causes `pg_upgrade` to fail. This commit fixes this by checking the global variable `IsBinaryUpgrade` and not refreshing cache if we are in the middle of doing a binary upgrade. Fixes timescale#1844

pehlert · 2020-05-06T16:03:55Z

Let me know if I can support in testing this. Unfortunately as these are quite old datasets (I think we started with TSDB 1.3) I cannot pinpoint what could have set the parameter on the database in the first place.

If a binary upgrade is in progress (when using `pg_upgrade`) the per-database setting of `timescaledb.restoring` can be included in the dump, which causes `pg_upgrade` to fail. This commit fixes this by checking the global variable `IsBinaryUpgrade` and not refreshing cache if we are in the middle of doing a binary upgrade. Fixes #1844

mkindahl · 2020-05-07T06:03:00Z

Let me know if I can support in testing this. Unfortunately as these are quite old datasets (I think we started with TSDB 1.3) I cannot pinpoint what could have set the parameter on the database in the first place.

Precisely the reason to remove the option.

It is not feasible to expect anybody to track down rouge settings of timescaledb.restoring when restoring so, IMHO, the option should be removed and we should make sure that pg_upgrade and pg_restore works without this option.

I created issue #1878 to track this.

If a binary upgrade is in progress (when using `pg_upgrade`) the per-database setting of `timescaledb.restoring` can be included in the dump, which causes `pg_upgrade` to fail. This commit fixes this by checking the global variable `IsBinaryUpgrade` and not refreshing cache if we are in the middle of doing a binary upgrade. Fixes #1844

mkindahl added this to the 1.7.1 milestone Apr 27, 2020

mkindahl self-assigned this May 4, 2020

mkindahl mentioned this issue May 6, 2020

Don't rely on timescaledb.restoring for upgrade #1871

Merged

mkindahl closed this as completed in #1871 May 7, 2020

mkindahl mentioned this issue May 8, 2020

Possible bug: ERROR: tried calling catalog_get when extension isn't loaded even though the extension is loaded #1682

Open

mkindahl added the pg-upgrade Issues related to upgrading the Postgres version label Oct 25, 2023

MA-MacDonald mentioned this issue Dec 9, 2023

[Bug]: time_bucket_gapfill is not gapfilling after upgrade to 2.13.0 #6385

Closed

majst01 mentioned this issue Dec 18, 2023

support postgres upgrades with timescaledb metal-stack/backup-restore-sidecar#68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading from PG 11 to 12 does not work using pg_upgrade #1844

Upgrading from PG 11 to 12 does not work using pg_upgrade #1844

pehlert commented Apr 22, 2020

trnubo commented Apr 23, 2020

trnubo commented Apr 23, 2020

pehlert commented Apr 23, 2020 •

edited

Loading

mfreed commented Apr 23, 2020

mkindahl commented Apr 29, 2020

pehlert commented Apr 29, 2020

pehlert commented Apr 29, 2020

mkindahl commented May 5, 2020

pehlert commented May 5, 2020 •

edited

Loading

mkindahl commented May 6, 2020

pehlert commented May 6, 2020

mkindahl commented May 7, 2020 •

edited

Loading

Upgrading from PG 11 to 12 does not work using pg_upgrade #1844

Upgrading from PG 11 to 12 does not work using pg_upgrade #1844

Comments

pehlert commented Apr 22, 2020

trnubo commented Apr 23, 2020

trnubo commented Apr 23, 2020

pehlert commented Apr 23, 2020 • edited Loading

mfreed commented Apr 23, 2020

mkindahl commented Apr 29, 2020

Summary

Reproduction steps

pehlert commented Apr 29, 2020

pehlert commented Apr 29, 2020

mkindahl commented May 5, 2020

pehlert commented May 5, 2020 • edited Loading

mkindahl commented May 6, 2020

pehlert commented May 6, 2020

mkindahl commented May 7, 2020 • edited Loading

pehlert commented Apr 23, 2020 •

edited

Loading

pehlert commented May 5, 2020 •

edited

Loading

mkindahl commented May 7, 2020 •

edited

Loading