Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting WARNING after ERROR on PG15 #4837

Closed
sb230132 opened this issue Oct 17, 2022 · 3 comments · Fixed by #5041
Closed

Reporting WARNING after ERROR on PG15 #4837

sb230132 opened this issue Oct 17, 2022 · 3 comments · Fixed by #5041
Assignees
Labels

Comments

@sb230132
Copy link
Contributor

What type of bug is this?

Incorrect result

What subsystems and features are affected?

Other

What happened?

TimescaleDB when compiled against PG15 causes many a test to report WARNING after an ERROR.
ERROR should be the last one to be reported.

TimescaleDB version affected

2.9.0

PostgreSQL version used

15.0

What operating system did you use?

Mac OS X

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

make installcheck TESTS="remote_txn"

Below is the diff

 COMMIT;
 WARNING:  kill event: pre-commit
-WARNING:  transaction rollback on data node "loopback" failed
 ERROR:  [loopback]: terminating connection due to administrator command
+WARNING:  transaction rollback on data node "loopback" failed
 SELECT debug_waitpoint_release('remote_conn_xact_end');
  debug_waitpoint_release
 -------------------------
@@ -27
@sb230132 sb230132 added the bug label Oct 17, 2022
@horzsolt horzsolt added the pg15 label Oct 17, 2022
@sb230132 sb230132 self-assigned this Nov 15, 2022
@sb230132
Copy link
Contributor Author

Here is the testcase:
Change module paths accordinlgy.
Last commit will cause the problem.

CREATE DATABASE "db_remote_txn";
\c db_remote_txn
CREATE EXTENSION timescaledb;

CREATE SCHEMA IF NOT EXISTS test;
GRANT USAGE ON SCHEMA test TO PUBLIC;

CREATE OR REPLACE FUNCTION test.remote_exec(srv_name name[], command text)
RETURNS VOID
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-tsl-2.9.0-dev.so', 'ts_remote_exec'
LANGUAGE C;

CREATE OR REPLACE FUNCTION test.remote_exec_get_result_strings(srv_name name[], command text)
RETURNS TABLE("table_record" CSTRING[])
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-tsl-2.9.0-dev.so', 'ts_remote_exec_get_result_strings'
LANGUAGE C;

CREATE OR REPLACE FUNCTION remote_node_killer_set_event(text, text)
RETURNS VOID
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-tsl-2.9.0-dev.so', 'ts_remote_node_killer_set_event'
LANGUAGE C;

CREATE OR REPLACE FUNCTION test_remote_txn_persistent_record(srv_name name)
RETURNS VOID
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-tsl-2.9.0-dev.so', 'ts_test_remote_txn_persistent_record'
LANGUAGE C;

CREATE OR REPLACE FUNCTION debug_waitpoint_enable(TEXT)
RETURNS VOID
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-2.9.0-dev.so', 'ts_debug_point_enable'
LANGUAGE C VOLATILE STRICT;

CREATE OR REPLACE FUNCTION debug_waitpoint_release(TEXT)
RETURNS VOID
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-2.9.0-dev.so', 'ts_debug_point_release'
LANGUAGE C VOLATILE STRICT;

CREATE OR REPLACE FUNCTION add_loopback_server(
    server_name            NAME,
    host                   TEXT = 'localhost',
    database               NAME = current_database(),
    port                   INTEGER = inet_server_port(),
    if_not_exists          BOOLEAN = FALSE,
    bootstrap              BOOLEAN = TRUE,
    password               TEXT = NULL
) RETURNS TABLE(server_name NAME, host TEXT, port INTEGER, database NAME,
                server_created BOOL, database_created BOOL, extension_created BOOL)
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-tsl-2.9.0-dev.so', 'ts_unchecked_add_data_node'
LANGUAGE C;

SELECT server_name, database, server_created, database_created, extension_created FROM add_loopback_server('loopback', database => 'db_remote_txn', bootstrap => false);
SELECT server_name, database, server_created, database_created, extension_created FROM add_loopback_server('loopback2', database => 'db_remote_txn', bootstrap => false);

drop table "S 1"."T 1";
CREATE SCHEMA "S 1";
CREATE TABLE "S 1"."T 1" (
    "C 1" int NOT NULL,
    c2 int NOT NULL,
    c3 text,
    c4 timestamptz,
    c5 timestamp,
    c6 varchar(10),
    c7 char(10),
    CONSTRAINT t1_pkey PRIMARY KEY ("C 1")
);
ANALYZE "S 1"."T 1";

INSERT INTO "S 1"."T 1"
    SELECT id,
           id % 10,
           to_char(id, 'FM00000'),
           '1970-01-01'::timestamptz + ((id % 100) || ' days')::interval,
           '1970-01-01'::timestamp + ((id % 100) || ' days')::interval,
           id % 10,
           id % 10
    FROM generate_series(1, 1000) id;

SELECT test_remote_txn_persistent_record('loopback');
SET timescaledb.enable_2pc = false;

BEGIN;
    SELECT test.remote_exec('{loopback}', $$ INSERT INTO "S 1"."T 1" VALUES (20001,1,'bleh', '2001-01-01', '2001-01-01', 'bleh') $$);
COMMIT;

_timescaledb_internal.show_connection_cache() ORDER BY 1,4;
BEGIN;
    SELECT remote_node_killer_set_event('pre-commit', 'loopback');
    SELECT debug_waitpoint_enable('remote_conn_xact_end');
    SELECT test.remote_exec('{loopback}', $$ INSERT INTO "S 1"."T 1" VALUES (20003,1,'bleh', '2001-01-01', '2001-01-01', 'bleh') $$);

COMMIT;

SELECT debug_waitpoint_release('remote_conn_xact_end');

@sb230132
Copy link
Contributor Author

May be this is an existing issue:

commit af7d9fe4d63e727e4d326b0c8ee09974f3d214e4
Author: Matvey Arye <mat@timescale.com>
Date:   Thu Jun 7 15:51:07 2018 -0400

    Add remote txns support (1pc)
    
    This commit adds the ability to open remote transactions
    that share transactional semantics with the ongoing local
    transaction. This is done by adding txn hooks to the local
    transaction that performs txn commands on remote nodes.
    
    This commit adds the 1pc variant of the commit protocol.
    A 2-pc variant is going to be added in an upcoming PR.
    The 1pc variant exists mostly for performance testing (i.e. how much is the 2pc penalty).
    But, there are also cases where 1pc is good enough (non-critical
    data and/or data backed by Kafka).
    
    Note that the remote_txn test is not run during sanitizer tests due to
    an apparent bug in PostgreSQL. (The test fails on an assertion of a
    global variable `AbortStartTime`, which is non-atomic although read
    across the ServerLoop and signal handlers.)

@sb230132
Copy link
Contributor Author

sb230132 commented Dec 1, 2022

In short this difference in the order in which WARNINGS are displayed seems to be a timing issue and an issue specific to psql utility.

On server side "commit" execution flow is as follows:

  1. Kill all data nodes(DN) remote processes.
  2. As part of step 1 send kill-event warning.
  3. Access node(AN) sends "commit" to all DNs.
  4. Wait for response from DNs.
  5. Since all DNs are down, an error is sent to AN.
  6. "ERROR: [loopback]: terminating connection due to administrator command" is sent to server log as well as client.
  7. AN decides to abort the transaction and sends "abort" command to all DNs.
  8. As part of step 7 "transaction rollback on data node ... " warning is sent to server log as well as client.

On client side, as part of client-server protocol communication below is the behavior:

  1. Client reads kill-event warning and writes to console.
  2. Client reads ERROR message and saves it.
  3. Client reads "transaction rollback on data node ... " warning and writes to console.
  4. Client writes error message to console.

However on client side, sometimes the order in which WARNING/ERROR is written to console varies based on timing.
If i put psql in debug mode, i could not reproduce the problem.

Since its not timescaledb specific issue, i plan to mask the warning messages in the tests output files. A similar fix was done earlier too. Refer( commit 5fd363d).

sb230132 added a commit to sb230132/timescaledb that referenced this issue Dec 1, 2022
Testcases which kill remote processes on data nodes, tend of rollback
transactions and as part of the process, WARNINGS/ERROR are reported to
client. Client however based on timing, reports WARNINGS/ERROR in different
order.

Fixes timescale#4837
sb230132 added a commit to sb230132/timescaledb that referenced this issue Dec 1, 2022
Testcases which kill remote processes on data nodes, tend of rollback
transactions and as part of the process, WARNINGS/ERROR are reported to
client. Client however based on timing, reports WARNINGS/ERROR in different
order.

Fixes timescale#4837
sb230132 added a commit to sb230132/timescaledb that referenced this issue Dec 2, 2022
In remote_txn, testcases which kill remote processes on data nodes,
tend to rollback transactions and as part of the process, WARNINGS/ERROR
are reported to client. Client however reports WARNINGS/ERROR in different
order intermittently. This is an issue specific to psql utility. Placing
psql in gdb and trying to diagnise the problem does not reproduce the issue.

This patch fixes the tests by not reporting WARNINGS.

Fixes timescale#4837
sb230132 added a commit to sb230132/timescaledb that referenced this issue Dec 5, 2022
In remote_txn, testcases which kill remote processes on data nodes,
tend to rollback transactions and as part of the process, WARNINGS/ERROR
are reported to client. Client however reports WARNINGS/ERROR in different
order intermittently. This is an issue specific to psql utility. Placing
psql in gdb and trying to diagnise the problem does not reproduce the issue.

This patch fixes the tests by not reporting WARNINGS.

Fixes timescale#4837
sb230132 added a commit to sb230132/timescaledb that referenced this issue Dec 5, 2022
In remote_txn, testcases which kill remote processes on data nodes,
tend to rollback transactions and as part of the process, WARNINGS/ERROR
are reported to client. Client however reports WARNINGS/ERROR in different
order intermittently. This is an issue specific to psql utility. Placing
psql in gdb and trying to diagnise the problem does not reproduce the issue.

This patch fixes the tests by not reporting WARNINGS.

Fixes timescale#4837
sb230132 added a commit to sb230132/timescaledb that referenced this issue Dec 6, 2022
In remote_txn, testcases which kill remote processes on data nodes,
tend to rollback transactions and as part of the process, WARNINGS/ERROR
are reported to client. Client however reports WARNINGS/ERROR in different
order intermittently. This is an issue specific to psql utility. Placing
psql in gdb and trying to diagnise the problem does not reproduce the issue.

This patch fixes the tests by not reporting WARNINGS.

Fixes timescale#4837
sb230132 added a commit that referenced this issue Dec 6, 2022
In remote_txn, testcases which kill remote processes on data nodes,
tend to rollback transactions and as part of the process, WARNINGS/ERROR
are reported to client. Client however reports WARNINGS/ERROR in different
order intermittently. This is an issue specific to psql utility. Placing
psql in gdb and trying to diagnise the problem does not reproduce the issue.

This patch fixes the tests by not reporting WARNINGS.

Fixes #4837
SachinSetiya pushed a commit that referenced this issue Dec 8, 2022
In remote_txn, testcases which kill remote processes on data nodes,
tend to rollback transactions and as part of the process, WARNINGS/ERROR
are reported to client. Client however reports WARNINGS/ERROR in different
order intermittently. This is an issue specific to psql utility. Placing
psql in gdb and trying to diagnise the problem does not reproduce the issue.

This patch fixes the tests by not reporting WARNINGS.

Fixes #4837
svenklemm pushed a commit that referenced this issue Dec 15, 2022
In remote_txn, testcases which kill remote processes on data nodes,
tend to rollback transactions and as part of the process, WARNINGS/ERROR
are reported to client. Client however reports WARNINGS/ERROR in different
order intermittently. This is an issue specific to psql utility. Placing
psql in gdb and trying to diagnise the problem does not reproduce the issue.

This patch fixes the tests by not reporting WARNINGS.

Fixes #4837
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants