Test telemetry_stats fails on PG15. #5037

sb230132 · 2022-11-29T09:09:14Z

What type of bug is this?

Other

What subsystems and features are affected?

Multi-node

What happened?

telemetry_stats test fails with below error.

@@ -712,7 +712,7 @@
     "indexes_size": 311296,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 357
+    "num_reltuples": 368
 }
 (1 row)
 
@@ -742,7 +742,7 @@
     "indexes_size": 311296,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 340
+    "num_reltuples": 329
 }
 (1 row)
 
@@ -818,7 +818,7 @@
         "compressed_toast_size": 16384,
         "num_compressed_chunks": 2,
         "uncompressed_heap_size": 16384,
-        "uncompressed_row_count": 72,
+        "uncompressed_row_count": 56,
         "compressed_indexes_size": 0,
         "uncompressed_toast_size": 0,
         "uncompressed_indexes_size": 65536,
@@ -827,7 +827,7 @@
     "indexes_size": 278528,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 357
+    "num_reltuples": 368
 }
 (1 row)
 
@@ -848,7 +848,7 @@
         "compressed_toast_size": 16384,
         "num_compressed_chunks": 2,
         "uncompressed_heap_size": 16384,
-        "uncompressed_row_count": 44,
+        "uncompressed_row_count": 60,
         "compressed_indexes_size": 0,
         "uncompressed_toast_size": 0,
         "uncompressed_indexes_size": 65536,
@@ -857,7 +857,7 @@
     "indexes_size": 278528,
     "num_children": 9,
     "num_relations": 1,
-    "num_reltuples": 340
+    "num_reltuples": 329
 }
 (1 row)

TimescaleDB version affected

2.9.0

PostgreSQL version used

15.0

What operating system did you use?

MacOS

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

On PG15 run make installcheck TESTS="telemetry_stats"

The text was updated successfully, but these errors were encountered:

sb230132 · 2022-12-02T10:26:57Z

Here is the shortened testcase which gives different statistics on pg14 vs pg15.

CREATE SCHEMA IF NOT EXISTS test;
GRANT USAGE ON SCHEMA test TO PUBLIC;

CREATE OR REPLACE FUNCTION test.remote_exec(srv_name name[], command text)
RETURNS VOID
AS '/Volumes/Work/postgresql-15.0/build/lib/timescaledb-tsl-2.9.0-dev.so', 'ts_remote_exec'
LANGUAGE C;

CREATE DATABASE "db_telemetry_stats";
\c db_telemetry_stats
CREATE EXTENSION timescaledb;

SET timescaledb.telemetry_level='no_functions';
SELECT setseed(1);

CREATE TABLE normal (time timestamptz NOT NULL, device int, temp float);
INSERT INTO normal
SELECT t, ceil(random() * 10)::int, random() * 30
FROM generate_series('2018-01-01'::timestamptz, '2018-02-28', '2h') t;
ANALYZE normal;

-- Become an access node by adding a data node
SELECT node_name, database, node_created, database_created, extension_created
FROM add_data_node('data_node_1', host => 'localhost', database => 'db_telemetry_stats_1');

SELECT node_name, database, node_created, database_created, extension_created
FROM add_data_node('data_node_2', host => 'localhost', database => 'db_telemetry_stats_2');

CREATE TABLE disthyper (LIKE normal);
SELECT create_distributed_hypertable('disthyper', 'time', 'device');
INSERT INTO disthyper SELECT * FROM normal;
ANALYZE disthyper;

-- Show data node stats
SELECT test.remote_exec(NULL, $$
	   SELECT
			jsonb_pretty(t -> 'relations' -> 'distributed_hypertables_data_node') AS distributed_hypertables_dn
	   FROM get_telemetry_report() t;
$$);

-- Add compression
ALTER TABLE disthyper SET (timescaledb.compress);
SELECT compress_chunk(c)
FROM show_chunks('disthyper') c ORDER BY c LIMIT 4;
ANALYZE disthyper;

-- Show data node stats
SELECT test.remote_exec(NULL, $$
	   SELECT
			jsonb_pretty(t -> 'relations' -> 'distributed_hypertables_data_node') AS distributed_hypertables_dn
	   FROM get_telemetry_report() t;
$$);

Statistics from remote nodes are different.
On further analysis i found that, when distributed hypertable is created, the way in which rows are distributed to data nodes is different in PG15.

On PG14 out of total 697 rows, data_node_1 gets 357 and data_node_2 gets 340
On PG15 out of total 697 rows, data_node_1 gets 368 and data_node_2 gets 329

because of above distribution of rows, the rest of the telemetry report also differs.

This is how data is populated in the table.

INSERT INTO normal
SELECT t, ceil(random() * 10)::int, random() * 30
FROM generate_series('2018-01-01'::timestamptz, '2018-02-28', '2h') t;

In my opinion this is an expected behaviour.

horzsolt · 2022-12-06T10:39:16Z

@erimatnor Can you please confirm if this is expected?

erimatnor · 2022-12-13T14:16:20Z

Different stats from nodes is expected if rows are distributed differently in PG15. Probably the hash function changed.

I am noting that the total number of tuples (reltuples) are the same in both versions:

PG14: 357 + 340 = 697
PG15: 368 + 329 = 697

I don't think we can just ignore this and create separate test outputs. If users upgrade from PG14 to PG15 the data distribution will change will existing chunks still use the old distribution. This will be a problem.

I think we should try to see if we can use the "old" hash function from PG14 in order to avoid a conflict when upgrading.

The telemetry_stats testcase uses random() with seed(1) to generate the column values on which the hypertable is partitioned. The Postgres commit postgres/postgres@3804539e48 updates the random() implementation to use a better algorithim causing the test to generate a different set of rows in PG15. Due to this the test failed in PG15 as the distrubution stats of the tuples have now changed. Fixed that by creating separate test outputs for PG15 and other releases. Fixes timescale#5037

The telemetry_stats testcase uses random() with seed(1) to generate the column values on which the hypertable is partitioned. The Postgres commit postgres/postgres@3804539e48 updates the random() implementation to use a better algorithim causing the test to generate a different set of rows in PG15. Due to this the test failed in PG15 as the distrubution stats of the tuples have now changed. Fixed that by creating separate test outputs for PG15 and other releases. Fixes #5037

sb230132 added bug pg15 Multinode Experience Team labels Nov 29, 2022

sb230132 mentioned this issue Nov 29, 2022

Failing tests on PG15. #4835

Closed

mkindahl assigned mkindahl and unassigned mkindahl Nov 29, 2022

sb230132 self-assigned this Nov 29, 2022

sb230132 changed the title ~~Test remote_txn fails on PG15.~~ Test telemetry_stats fails on PG15. Nov 29, 2022

sb230132 assigned sb230132 and unassigned sb230132 Nov 29, 2022

horzsolt assigned lkshminarayanan and unassigned sb230132 Jan 3, 2023

lkshminarayanan mentioned this issue Jan 10, 2023

Fix telemetry_stats test in PG15 #5162

Merged

lkshminarayanan closed this as completed in #5162 Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test telemetry_stats fails on PG15. #5037

Test telemetry_stats fails on PG15. #5037

sb230132 commented Nov 29, 2022 •

edited

sb230132 commented Dec 2, 2022 •

edited

horzsolt commented Dec 6, 2022

erimatnor commented Dec 13, 2022

Test telemetry_stats fails on PG15. #5037

Test telemetry_stats fails on PG15. #5037

Comments

sb230132 commented Nov 29, 2022 • edited

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

sb230132 commented Dec 2, 2022 • edited

horzsolt commented Dec 6, 2022

erimatnor commented Dec 13, 2022

sb230132 commented Nov 29, 2022 •

edited

sb230132 commented Dec 2, 2022 •

edited