Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test telemetry_stats fails on PG15. #5037

Closed
sb230132 opened this issue Nov 29, 2022 · 3 comments · Fixed by #5162
Closed

Test telemetry_stats fails on PG15. #5037

sb230132 opened this issue Nov 29, 2022 · 3 comments · Fixed by #5162

Comments

@sb230132
Copy link
Contributor

sb230132 commented Nov 29, 2022

What type of bug is this?

Other

What subsystems and features are affected?

Multi-node

What happened?

telemetry_stats test fails with below error.

@@ -712,7 +712,7 @@
     "indexes_size": 311296,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 357
+    "num_reltuples": 368
 }
 (1 row)
 
@@ -742,7 +742,7 @@
     "indexes_size": 311296,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 340
+    "num_reltuples": 329
 }
 (1 row)
 
@@ -818,7 +818,7 @@
         "compressed_toast_size": 16384,
         "num_compressed_chunks": 2,
         "uncompressed_heap_size": 16384,
-        "uncompressed_row_count": 72,
+        "uncompressed_row_count": 56,
         "compressed_indexes_size": 0,
         "uncompressed_toast_size": 0,
         "uncompressed_indexes_size": 65536,
@@ -827,7 +827,7 @@
     "indexes_size": 278528,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 357
+    "num_reltuples": 368
 }
 (1 row)
 
@@ -848,7 +848,7 @@
         "compressed_toast_size": 16384,
         "num_compressed_chunks": 2,
         "uncompressed_heap_size": 16384,
-        "uncompressed_row_count": 44,
+        "uncompressed_row_count": 60,
         "compressed_indexes_size": 0,
         "uncompressed_toast_size": 0,
         "uncompressed_indexes_size": 65536,
@@ -857,7 +857,7 @@
     "indexes_size": 278528,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 340
+    "num_reltuples": 329
 }
 (1 row)

TimescaleDB version affected

2.9.0

PostgreSQL version used

15.0

What operating system did you use?

MacOS

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

On PG15 run make installcheck TESTS="telemetry_stats"
@mkindahl mkindahl assigned mkindahl and unassigned mkindahl Nov 29, 2022
@sb230132 sb230132 self-assigned this Nov 29, 2022
@sb230132 sb230132 changed the title Test remote_txn fails on PG15. Test telemetry_stats fails on PG15. Nov 29, 2022
@sb230132 sb230132 assigned sb230132 and unassigned sb230132 Nov 29, 2022
@sb230132
Copy link
Contributor Author

sb230132 commented Dec 2, 2022

Here is the shortened testcase which gives different statistics on pg14 vs pg15.

CREATE SCHEMA IF NOT EXISTS test;
GRANT USAGE ON SCHEMA test TO PUBLIC;

CREATE OR REPLACE FUNCTION test.remote_exec(srv_name name[], command text)
RETURNS VOID
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-tsl-2.9.0-dev.so', 'ts_remote_exec'
LANGUAGE C;

CREATE DATABASE "db_telemetry_stats";
\c db_telemetry_stats
CREATE EXTENSION timescaledb;

SET timescaledb.telemetry_level='no_functions';
SELECT setseed(1);

CREATE TABLE normal (time timestamptz NOT NULL, device int, temp float);
INSERT INTO normal
SELECT t, ceil(random() * 10)::int, random() * 30
FROM generate_series('2018-01-01'::timestamptz, '2018-02-28', '2h') t;
ANALYZE normal;

-- Become an access node by adding a data node
SELECT node_name, database, node_created, database_created, extension_created
FROM add_data_node('data_node_1', host => 'localhost', database => 'db_telemetry_stats_1');

SELECT node_name, database, node_created, database_created, extension_created
FROM add_data_node('data_node_2', host => 'localhost', database => 'db_telemetry_stats_2');

CREATE TABLE disthyper (LIKE normal);
SELECT create_distributed_hypertable('disthyper', 'time', 'device');
INSERT INTO disthyper SELECT * FROM normal;
ANALYZE disthyper;

-- Show data node stats
SELECT test.remote_exec(NULL, $$
	   SELECT
			jsonb_pretty(t -> 'relations' -> 'distributed_hypertables_data_node') AS distributed_hypertables_dn
	   FROM get_telemetry_report() t;
$$);

-- Add compression
ALTER TABLE disthyper SET (timescaledb.compress);
SELECT compress_chunk(c)
FROM show_chunks('disthyper') c ORDER BY c LIMIT 4;
ANALYZE disthyper;

-- Show data node stats
SELECT test.remote_exec(NULL, $$
	   SELECT
			jsonb_pretty(t -> 'relations' -> 'distributed_hypertables_data_node') AS distributed_hypertables_dn
	   FROM get_telemetry_report() t;
$$);

Statistics from remote nodes are different.
On further analysis i found that, when distributed hypertable is created, the way in which rows are distributed to data nodes is different in PG15.

On PG14 out of total 697 rows, data_node_1 gets 357 and data_node_2 gets 340
On PG15 out of total 697 rows, data_node_1 gets 368 and data_node_2 gets 329

because of above distribution of rows, the rest of the telemetry report also differs.

This is how data is populated in the table.

INSERT INTO normal
SELECT t, ceil(random() * 10)::int, random() * 30
FROM generate_series('2018-01-01'::timestamptz, '2018-02-28', '2h') t;

In my opinion this is an expected behaviour.

@horzsolt
Copy link

horzsolt commented Dec 6, 2022

@erimatnor Can you please confirm if this is expected?

@erimatnor
Copy link
Contributor

Different stats from nodes is expected if rows are distributed differently in PG15. Probably the hash function changed.

I am noting that the total number of tuples (reltuples) are the same in both versions:

PG14: 357 + 340 = 697
PG15: 368 + 329 = 697

I don't think we can just ignore this and create separate test outputs. If users upgrade from PG14 to PG15 the data distribution will change will existing chunks still use the old distribution. This will be a problem.

I think we should try to see if we can use the "old" hash function from PG14 in order to avoid a conflict when upgrading.

@horzsolt horzsolt assigned lkshminarayanan and unassigned sb230132 Jan 3, 2023
lkshminarayanan added a commit to lkshminarayanan/timescaledb that referenced this issue Jan 10, 2023
The telemetry_stats testcase uses random() with seed(1) to generate the
column values on which the hypertable is partitioned. The Postgres commit
postgres/postgres@3804539e48 updates the random() implementation to use a
better algorithim causing the test to generate a different set of rows in
PG15. Due to this the test failed in PG15 as the distrubution stats of
the tuples have now changed. Fixed that by creating separate test
outputs for PG15 and other releases.

Fixes timescale#5037
lkshminarayanan added a commit that referenced this issue Jan 10, 2023
The telemetry_stats testcase uses random() with seed(1) to generate the
column values on which the hypertable is partitioned. The Postgres commit
postgres/postgres@3804539e48 updates the random() implementation to use a
better algorithim causing the test to generate a different set of rows in
PG15. Due to this the test failed in PG15 as the distrubution stats of
the tuples have now changed. Fixed that by creating separate test
outputs for PG15 and other releases.

Fixes #5037
sb230132 pushed a commit that referenced this issue Jan 24, 2023
The telemetry_stats testcase uses random() with seed(1) to generate the
column values on which the hypertable is partitioned. The Postgres commit
postgres/postgres@3804539e48 updates the random() implementation to use a
better algorithim causing the test to generate a different set of rows in
PG15. Due to this the test failed in PG15 as the distrubution stats of
the tuples have now changed. Fixed that by creating separate test
outputs for PG15 and other releases.

Fixes #5037
sb230132 pushed a commit that referenced this issue Jan 24, 2023
The telemetry_stats testcase uses random() with seed(1) to generate the
column values on which the hypertable is partitioned. The Postgres commit
postgres/postgres@3804539e48 updates the random() implementation to use a
better algorithim causing the test to generate a different set of rows in
PG15. Due to this the test failed in PG15 as the distrubution stats of
the tuples have now changed. Fixed that by creating separate test
outputs for PG15 and other releases.

Fixes #5037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants