Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Assertion failure in ts_hypertable_has_compression_table #4795

Closed
svenklemm opened this issue Oct 4, 2022 · 9 comments · Fixed by #5086
Closed

[Bug]: Assertion failure in ts_hypertable_has_compression_table #4795

svenklemm opened this issue Oct 4, 2022 · 9 comments · Fixed by #5086
Assignees
Labels
bug flaky-test Issue about a flaky test

Comments

@svenklemm
Copy link
Member

This is hit in sanitizer test occasionally.

Failed run with coredump:

Query that triggers (not guaranteed):

SELECT
    count(*)
FROM
    metrics_compressed_summary
WHERE
    bucket > _timescaledb_internal.to_timestamp(
        _timescaledb_internal.cagg_watermark(
            (SELECT
                mat_hypertable_id
             FROM
                _timescaledb_catalog.continuous_agg
             WHERE
                user_view_name = 'metrics_compressed_summary')
        )
    );

Stacktrace:

#2  0x0000555ecac25f18 in ExceptionalCondition (conditionName=conditionName@entry=0x7f0774d9c4e0 "ht->fd.compression_state == HypertableCompressionEnabled", errorType=errorType@entry=0x7f0774d99380 "FailedAssertion",
    fileName=fileName@entry=0x7f0774d99480 "/home/runner/work/timescaledb/timescaledb/src/hypertable.c", lineNumber=lineNumber@entry=2909) at assert.c:69
#3  0x00007f0774c1bfe1 in ts_hypertable_has_compression_table (ht=0x6250000c3870) at /home/runner/work/timescaledb/timescaledb/src/hypertable.c:2909
#4  0x00007f0774d06a35 in timescaledb_get_relation_info_hook (root=0x6310000e10c8, relation_objectid=<optimized out>, inhparent=<optimized out>, rel=0x7f077324aac8) at /home/runner/work/timescaledb/timescaledb/src/planner/planner.c:1277
#5  0x0000555eca3090d1 in get_relation_info (root=root@entry=0x6310000e10c8, relationObjectId=<optimized out>, inhparent=inhparent@entry=false, rel=rel@entry=0x7f077324aac8) at plancat.c:482
#6  0x0000555eca324a3f in build_simple_rel (root=root@entry=0x6310000e10c8, relid=relid@entry=4, parent=parent@entry=0x0) at relnode.c:307
#7  0x00007f0774377b48 in decompress_chunk_add_plannerinfo (root=root@entry=0x6310000e10c8, info=info@entry=0x6310000eaa48, chunk=chunk@entry=0x7f077324bce8, chunk_rel=chunk_rel@entry=0x6310000e2fd0,
    needs_sequence_num=needs_sequence_num@entry=false) at /home/runner/work/timescaledb/timescaledb/tsl/src/nodes/decompress_chunk/decompress_chunk.c:1101
#8  0x00007f0774378ad0 in ts_decompress_chunk_generate_paths (root=root@entry=0x6310000e10c8, chunk_rel=chunk_rel@entry=0x6310000e2fd0, ht=ht@entry=0x625000131338, chunk=chunk@entry=0x7f077324bce8)
    at /home/runner/work/timescaledb/timescaledb/tsl/src/nodes/decompress_chunk/decompress_chunk.c:391
#9  0x00007f0774249980 in tsl_set_rel_pathlist_query (root=0x6310000e10c8, rel=0x6310000e2fd0, rti=<optimized out>, rte=<optimized out>, ht=0x625000131338) at /home/runner/work/timescaledb/timescaledb/tsl/src/planner.c:114
#10 0x00007f0774d0a8a6 in apply_optimizations (root=root@entry=0x6310000e10c8, reltype=reltype@entry=TS_REL_CHUNK_CHILD, rel=rel@entry=0x6310000e2fd0, rte=rte@entry=0x6310000e6698, ht=0x625000131338)
    at /home/runner/work/timescaledb/timescaledb/src/planner/planner.c:1072
#11 0x00007f0774d0b864 in timescaledb_set_rel_pathlist (root=0x6310000e10c8, rel=<optimized out>, rti=<optimized out>, rte=0x6310000e6698) at /home/runner/work/timescaledb/timescaledb/src/planner/planner.c:1219
#12 0x00007f0774cbe64e in set_rel_pathlist (root=root@entry=0x6310000e10c8, rel=rel@entry=0x6310000e2fd0, rti=rti@entry=2, rte=rte@entry=0x6310000e6698) at /home/runner/work/timescaledb/timescaledb/src/import/allpaths.c:251
#13 0x00007f0774cbf624 in ts_set_append_rel_pathlist (root=root@entry=0x6310000e10c8, rel=rel@entry=0x6310000e2b58, rti=rti@entry=1, rte=rte@entry=0x6310000e0028) at /home/runner/work/timescaledb/timescaledb/src/import/allpaths.c:176
#14 0x00007f0774d09a7e in reenable_inheritance (root=root@entry=0x6310000e10c8, rel=rel@entry=0x6310000e2b58, rti=rti@entry=1, rte=rte@entry=0x6310000e0028) at /home/runner/work/timescaledb/timescaledb/src/planner/planner.c:1043
#15 0x00007f0774d0b8bf in timescaledb_set_rel_pathlist (root=0x6310000e10c8, rel=<optimized out>, rti=<optimized out>, rte=0x6310000e0028) at /home/runner/work/timescaledb/timescaledb/src/planner/planner.c:1173
#16 0x0000555eca136c82 in set_rel_pathlist (root=root@entry=0x6310000e10c8, rel=rel@entry=0x6310000e2b58, rti=rti@entry=1, rte=0x6310000e0028) at allpaths.c:541
#17 0x0000555eca137176 in set_base_rel_pathlists (root=root@entry=0x6310000e10c8) at allpaths.c:353
#18 0x0000555eca1391c7 in make_one_rel (root=root@entry=0x6310000e10c8, joinlist=joinlist@entry=0x6310000e3f50) at allpaths.c:223
#19 0x0000555eca20d65e in query_planner (root=root@entry=0x6310000e10c8, qp_callback=qp_callback@entry=0x555eca20f4d8 <standard_qp_callback>, qp_extra=qp_extra@entry=0x7ffe76ca1e40) at planmain.c:276
#20 0x0000555eca231ca8 in grouping_planner (root=root@entry=0x6310000e10c8, tuple_fraction=<optimized out>, tuple_fraction@entry=0) at planner.c:1442
#21 0x0000555eca239bef in subquery_planner (glob=<optimized out>, parse=parse@entry=0x6310000dfeb8, parent_root=parent_root@entry=0x62d000141650, hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0)
    at planner.c:1019
#22 0x0000555eca12f2d5 in set_subquery_pathlist (root=root@entry=0x62d000141650, rel=rel@entry=0x6310000dea68, rti=rti@entry=4, rte=rte@entry=0x62900016bc48) at allpaths.c:2229
#23 0x0000555eca13014d in set_rel_size (root=root@entry=0x62d000141650, rel=rel@entry=0x6310000dea68, rti=rti@entry=4, rte=rte@entry=0x62900016bc48) at allpaths.c:423
#24 0x0000555eca13112a in set_append_rel_size (root=root@entry=0x62d000141650, rel=rel@entry=0x6310000de598, rti=rti@entry=1, rte=rte@entry=0x625000249140) at allpaths.c:1105
#25 0x0000555eca12ff43 in set_rel_size (root=root@entry=0x62d000141650, rel=rel@entry=0x6310000de598, rti=rti@entry=1, rte=rte@entry=0x625000249140) at allpaths.c:384
#26 0x0000555eca132bf7 in set_base_rel_sizes (root=root@entry=0x62d000141650) at allpaths.c:324
#27 0x0000555eca138e73 in make_one_rel (root=root@entry=0x62d000141650, joinlist=joinlist@entry=0x6310000de940) at allpaths.c:186
#28 0x0000555eca20d65e in query_planner (root=root@entry=0x62d000141650, qp_callback=qp_callback@entry=0x555eca20f4d8 <standard_qp_callback>, qp_extra=qp_extra@entry=0x7ffe76ca2650) at planmain.c:276
#29 0x0000555eca231ca8 in grouping_planner (root=root@entry=0x62d000141650, tuple_fraction=<optimized out>, tuple_fraction@entry=0) at planner.c:1442
#30 0x0000555eca239bef in subquery_planner (glob=glob@entry=0x62900016bac8, parse=parse@entry=0x625000006f80, parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0)
    at planner.c:1019
#31 0x0000555eca23b3e5 in standard_planner (parse=parse@entry=0x625000006f80,
    query_string=query_string@entry=0x625000005220 "SELECT\n    count(*)\nFROM\n    metrics_compressed_summary\nWHERE\n    bucket > _timescaledb_internal.to_timestamp(\n        _timescaledb_internal.cagg_watermark(\n            (SELECT\n
                mat_hypertable_id\n             FROM\n                _timescaledb_catalog.continuous_agg\n             WHERE\n                user_view_name = 'metrics_compressed_summary')\n        )\n    );",
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at planner.c:400
#32 0x00007f0774d04892 in timescaledb_planner (parse=<optimized out>, query_string=<optimized out>, cursor_opts=<optimized out>, bound_params=<optimized out>) at /home/runner/work/timescaledb/timescaledb/src/planner/planner.c:578
#33 0x0000555eca23e7e5 in planner (parse=parse@entry=0x625000006f80,
    query_string=query_string@entry=0x625000005220 "SELECT\n    count(*)\nFROM\n    metrics_compressed_summary\nWHERE\n    bucket > _timescaledb_internal.to_timestamp(\n        _timescaledb_internal.cagg_watermark(\n            (SELECT\n
                mat_hypertable_id\n             FROM\n                _timescaledb_catalog.continuous_agg\n             WHERE\n                user_view_name = 'metrics_compressed_summary')\n        )\n    );",
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at planner.c:269
#34 0x0000555eca6a2856 in pg_plan_query (querytree=querytree@entry=0x625000006f80,
    query_string=query_string@entry=0x625000005220 "SELECT\n    count(*)\nFROM\n    metrics_compressed_summary\nWHERE\n    bucket > _timescaledb_internal.to_timestamp(\n        _timescaledb_internal.cagg_watermark(\n            (SELECT\n
                mat_hypertable_id\n             FROM\n                _timescaledb_catalog.continuous_agg\n             WHERE\n                user_view_name = 'metrics_compressed_summary')\n        )\n    );",
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at postgres.c:846
#35 0x0000555eca6a30e3 in pg_plan_queries (querytrees=0x6290001686f0,

Extra context:

(gdb) p *ht
$3 = {fd = {id = 2139062143, schema_name = {
      data = "\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\17
7\177\177\177\177\177\177\177\177"}, table_name = {
      data = "\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\17
7\177\177\177\177\177\177\177\177"}, associated_schema_name = {
      data = "\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\17
7\177\177\177\177\177\177\177\177"}, associated_table_prefix = {
      data = "\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\17
7\177\177\177\177\177\177\177\177"}, num_dimensions = 32639, chunk_sizing_func_schema = {
      data = "\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\17
7\177\177\177\177\177\177\177\177"}, chunk_sizing_func_name = {
      data = "\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\17
7\177\177\177\177\177\177\177\177"}, chunk_target_size = 9187201950435737471, compression_state = 32639, compressed_hypertable_id = 2139062143, replication_factor = 32639}, main_table_relid = 2139062143, chunk_sizing_func = 2139062143,
  space = 0x7f7f7f7f7f7f7f7f, chunk_cache = 0x7f7f7f7f7f7f7f7f, data_nodes = 0x7f7f7f7f7f7f7f7f}
@svenklemm svenklemm added bug flaky-test Issue about a flaky test labels Oct 5, 2022
@konskov
Copy link
Contributor

konskov commented Nov 8, 2022

@konskov
Copy link
Contributor

konskov commented Nov 9, 2022

@konskov
Copy link
Contributor

konskov commented Nov 10, 2022

another instance? https://github.com/timescale/timescaledb/actions/runs/3431435306/jobs/5719565367
But trigger seems to be

SELECT * FROM conditions_summary_daily
EXCEPT
SELECT * FROM conditions_summary_daily_new;

@konskov
Copy link
Contributor

konskov commented Nov 14, 2022

@jnidzwetzki
Copy link
Contributor

jnidzwetzki commented Dec 8, 2022

Still happening: https://github.com/timescale/timescaledb/actions/runs/3642792492/jobs/6150310637

(gdb) +printf "query = '%s'\n\n", debug_query_string
Reading in symbols for postgres.c...
query = 'SELECT
    count(*)
FROM
    metrics_compressed_summary
WHERE
    bucket > _timescaledb_internal.to_timestamp(
        _timescaledb_internal.cagg_watermark(
            (SELECT
                mat_hypertable_id
             FROM
                _timescaledb_catalog.continuous_agg
             WHERE
                user_view_name = 'metrics_compressed_summary')
        )
    );'

(gdb) +frame function ExceptionalCondition
Reading in symbols for /home/runner/work/timescaledb/timescaledb/src/hypertable.c...
#2  0x000055db75e5886f in ExceptionalCondition (
    Reading in symbols for /home/runner/work/timescaledb/timescaledb/src/planner/planner.c...
conditionName=conditionName@entry=0x7f22dcf75e80 "ht->fd.compression_state == HypertableCompressionEnabled", 
    errorType=errorType@entry=0x7f22dcf72d80 "FailedAssertion", 
    fileName=fileName@entry=0x7f22dcf72e80 "/home/runner/work/timescaledb/timescaledb/src/hypertable.c", lineNumber=lineNumber@entry=2943) at assert.c:67
67		abort();
(gdb) +printf "condition = '%s'\n", conditionName
condition = 'ht->fd.compression_state == HypertableCompressionEnabled'
(gdb) +up 1
#3  0x00007f22dcdfd0fc in ts_hypertable_has_compression_table (
    Reading in symbols for plancat.c...
ht=0x6250000bbb38)
    at /home/runner/work/timescaledb/timescaledb/src/hypertable.c:2943
2943			Assert(ht->fd.compression_state == HypertableCompressionEnabled);
(gdb) +l
2938	bool
2939	ts_hypertable_has_compression_table(const Hypertable *ht)
2940	{
2941		if (ht->fd.compressed_hypertable_id != INVALID_HYPERTABLE_ID)
2942		{
2943			Assert(ht->fd.compression_state == HypertableCompressionEnabled);
2944			return true;
2945		}
2946		return false;
2947	}

@akuzm
Copy link
Member

akuzm commented Dec 8, 2022

Also reproduces on a different query:

ci_stats=> select distinct c, q from (select job_date, job_name, regexp_matches(log_contents, '\nquery = ''(.*)''\n\n') q, regexp_matches(log_contents, 'condition = ''(.*)''\n', 'n') c from log natural join job order by c, job_date desc, q) t where c[1] like '%compression%'
;
─[ RECORD 1 ]───────────────────────────────────────────────────────────────────────────────────────────────────────────
c │ {"!(ht->fd.compression_state == HypertableCompressionEnabled)"}
q │ {"SELECT                                                                                                            ↵
  │     count(*)                                                                                                        ↵
  │ FROM                                                                                                                ↵
  │     metrics_compressed_summary                                                                                      ↵
  │ WHERE                                                                                                               ↵
  │     bucket > _timescaledb_internal.to_timestamp(                                                                    ↵
  │         _timescaledb_internal.cagg_watermark(                                                                       ↵
  │             (SELECT                                                                                                 ↵
  │                 mat_hypertable_id                                                                                   ↵
  │              FROM                                                                                                   ↵
  │                 _timescaledb_catalog.continuous_agg                                                                 ↵
  │              WHERE                                                                                                  ↵
  │                 user_view_name = 'metrics_compressed_summary')                                                      ↵
  │         )                                                                                                           ↵
  │     );"}
─[ RECORD 2 ]───────────────────────────────────────────────────────────────────────────────────────────────────────────
c │ {"ht->fd.compression_state == HypertableCompressionEnabled"}
q │ {"SELECT                                                                                                            ↵
  │     count(*)                                                                                                        ↵
  │ FROM                                                                                                                ↵
  │     metrics_compressed_summary                                                                                      ↵
  │ WHERE                                                                                                               ↵
  │     bucket > _timescaledb_internal.to_timestamp(                                                                    ↵
  │         _timescaledb_internal.cagg_watermark(                                                                       ↵
  │             (SELECT                                                                                                 ↵
  │                 mat_hypertable_id                                                                                   ↵
  │              FROM                                                                                                   ↵
  │                 _timescaledb_catalog.continuous_agg                                                                 ↵
  │              WHERE                                                                                                  ↵
  │                 user_view_name = 'metrics_compressed_summary')                                                      ↵
  │         )                                                                                                           ↵
  │     );"}
─[ RECORD 3 ]───────────────────────────────────────────────────────────────────────────────────────────────────────────
c │ {"ht->fd.compression_state == HypertableCompressionEnabled"}
q │ {"SELECT * FROM metrics_compressed_summary m FULL JOIN devices d USING (device_id) ORDER BY m.bucket, m.device_id;"}

@mkindahl mkindahl self-assigned this Dec 8, 2022
@jnidzwetzki jnidzwetzki added this to the TimescaleDB 2.9 milestone Dec 8, 2022
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Dec 9, 2022
A cache is used for the duration of a query to speed up planning time,
but when destroying the cache, an error in the destruction would leave
the cache pointer pointing to a partially or fully destroyed cache,
which would give incorrect cache hits for the next query.

This commit changes the order of the assignment so that the cache
pointer is cleared before destroying the cache. This will allocate a
new cache for the next query, even if the previous destruction had an
error.

Fixes timescale#4795
@mkindahl
Copy link
Contributor

The planner uses a hypertable cache and a baserel cache. Since the planner calls can be recursive, they both need to handle recursion.

The hypertable cache uses a stack of caches and will typically push a pointer to the same cache on a stack. Once an invalidation event for the hypertable meta-table comes in, it will replace the current hypertable cache with a new version. In the end you might have a cache stack that looks like this, where an invalidation event occurred causing a new H2 hypertable cache to be created.

H1 H1 ... H1 H2 ... H2

The baserel cache is created once for all recursions and destroyed when existing the top-most recursive call of the planner.

After some investigations, this is a hypothesis to go by:

  1. Planner is called
  2. hypertable cache H1 is set up
  3. baserel cache is set up
  4. Recursive call into planner
  5. Invalidation happens, so topmost hypertable cache is replaced by H2
  6. baserel cache records a hypertable entry, which points to H2
  7. Leave recursive call, which destroys the H2 cache.
  8. baserel cache now has a dangling pointer into the H2 cache, which can cause a crash.

@svenklemm
Copy link
Member Author

There is indeed entanglement between the hypertable cache and the baserel cache but there are additional problems with the baserel cache since it uses the reloid as the cache key wich means it will not return correct results when chunks occur multiple times in a plan as they could require different classification.

@akuzm
Copy link
Member

akuzm commented Dec 13, 2022

it uses the reloid as the cache key wich means it will not return correct results when chunks occur multiple times in a plan as they could require different classification.

This problem was fixed here: #4870

mkindahl added a commit to mkindahl/timescaledb that referenced this issue Dec 13, 2022
When popping the hypertable cache stack, it might happen that the
hypertable cache was invalidated betweem the push and the pop. In that
case, the baserel cache can contain invalid entries pointing to the now
popped hypertable cache, so we reset the baserel cache.

Fixes timescale#4795
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Dec 13, 2022
When popping the hypertable cache stack, it might happen that the
hypertable cache was invalidated betweem the push and the pop. In that
case, the baserel cache can contain invalid entries pointing to the now
popped hypertable cache, so we reset the baserel cache.

Fixes timescale#4795
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Dec 13, 2022
When popping the hypertable cache stack, it might happen that the
hypertable cache was invalidated betweem the push and the pop. In that
case, the baserel cache can contain invalid entries pointing to the now
popped hypertable cache, so we reset the baserel cache.

Fixes timescale#4795
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Dec 13, 2022
When popping the hypertable cache stack, it might happen that the
hypertable cache was invalidated betweem the push and the pop. In that
case, the baserel cache can contain invalid entries pointing to the now
popped hypertable cache, so we reset the baserel cache.

Fixes timescale#4795
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Dec 13, 2022
When popping the hypertable cache stack, it might happen that the
hypertable cache was invalidated between the push and the pop. In that
case, the baserel cache can contain invalid entries pointing to the now
popped hypertable cache, so we reset the baserel cache.

Fixes timescale#4795
mkindahl added a commit that referenced this issue Dec 13, 2022
When popping the hypertable cache stack, it might happen that the
hypertable cache was invalidated between the push and the pop. In that
case, the baserel cache can contain invalid entries pointing to the now
popped hypertable cache, so we reset the baserel cache.

Fixes #4795
svenklemm pushed a commit that referenced this issue Dec 15, 2022
When popping the hypertable cache stack, it might happen that the
hypertable cache was invalidated between the push and the pop. In that
case, the baserel cache can contain invalid entries pointing to the now
popped hypertable cache, so we reset the baserel cache.

Fixes #4795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug flaky-test Issue about a flaky test
Projects
None yet
5 participants