Calling refresh_continuous_aggregate under a User-Defined-Action causes segmentation fault #3145

hardikm10 · 2021-04-21T12:42:12Z

Relevant system information:
OS: [e.g. Ubuntu 16.04, Windows 10 x64, etc]: Docker / timescale/timescaledb:2.2.0-pg11

Describe the bug
Executing CALL refresh_continuous_aggregate() under a user defined action causes segmentation fault.

To Reproduce
Steps to reproduce the behavior:
drop table sensor_data;

create table sensor_data( time timestamptz not null, sensor_id integer not null, cpu double precision null, temperature double precision null );

select from create_hypertable('sensor_data','time');

insert into sensor_data select time + (interval '1 minute' * random()) as time, sensor_id, random() as cpu, random()*100 as temperature from generate_series(now() - interval '1 months', now() - interval '1 week', interval '10 minute') as g1(time), generate_series(1, 100, 1 ) as g2(sensor_id) order by time;

create materialized view sensor_summary_hourly with (timescaledb.continuous) as select time_bucket('1 hour', time) one_hour, sensor_id, avg(cpu) as avg_cpu from sensor_data group by one_hour, sensor_id with no data;

CREATE OR REPLACE PROCEDURE refresh_via_job(job_id int, config jsonb) LANGUAGE PLPGSQL AS $$ BEGIN CALL refresh_continuous_aggregate('sensor_summary_hourly', null,null); END $$;
SELECT add_job('refresh_via_job','2 day', config => '{}');

CALL run_job(1000); -- -- or whatever the job id returned but it crashes during here

Expected behavior
It should execute under the User Defined Action and the CAGG should be updated

Actual behavior
Server crashes with a segfault when the job is called.

Additional context
The problem I am trying to solve is to write a UDA which refreshes a cagg on a large hypertable in smaller batches, programmatically since it couldn't be written into .

The text was updated successfully, but these errors were encountered:

mkindahl · 2021-04-26T06:46:01Z

Stack trace:

#0  0x0000558a61f48aeb in _SPI_execute_plan (plan=<optimized out>, paramLI=0x0, snapshot=<optimized out>, crosscheck_snapshot=<optimized out>, read_only=<optimized out>, fire_triggers=<optimized out>, tcount=<optimized out>)
    at spi.c:2326
2326                                    if (_SPI_current->tuptable)
(gdb) bt
#0  0x0000558a61f48aeb in _SPI_execute_plan (plan=<optimized out>, paramLI=0x0, snapshot=<optimized out>, crosscheck_snapshot=<optimized out>, read_only=<optimized out>, fire_triggers=<optimized out>, tcount=<optimized out>)
    at spi.c:2326
#1  0x0000558a61f495ef in SPI_execute_plan_with_paramlist (plan=0x558a6361a528, params=params@entry=0x0, read_only=<optimized out>, tcount=tcount@entry=0) at spi.c:577
#2  0x00007f58d530cc6b in exec_stmt_call (estate=0x7ffee509df00, stmt=0x558a63609c28) at pl_exec.c:2329
#3  0x00007f58d530f0a3 in exec_stmt (estate=estate@entry=0x7ffee509df00, stmt=0x558a63609c28) at pl_exec.c:1976
#4  0x00007f58d53126b4 in exec_stmts (estate=0x7ffee509df00, stmts=<optimized out>) at pl_exec.c:1931
#5  0x00007f58d5312aa0 in exec_stmt_block (estate=estate@entry=0x7ffee509df00, block=block@entry=0x558a6360a1a0) at pl_exec.c:1872
#6  0x00007f58d530fb53 in exec_stmt (estate=estate@entry=0x7ffee509df00, stmt=0x558a6360a1a0) at pl_exec.c:1964
#7  0x00007f58d5312275 in plpgsql_exec_function (func=func@entry=0x558a635f76d8, fcinfo=fcinfo@entry=0x7ffee509e200, simple_eval_estate=simple_eval_estate@entry=0x0, atomic=<optimized out>) at pl_exec.c:596
#8  0x00007f58d5304b49 in plpgsql_call_handler (fcinfo=0x7ffee509e200) at pl_handler.c:267
#9  0x0000558a61ead0df in ExecuteCallStmt (stmt=<optimized out>, params=<optimized out>, atomic=<optimized out>, dest=0x558a624ba060 <donothingDR>) at functioncmds.c:2410
#10 0x00007f58d53a8360 in job_execute_procedure (funcexpr=0x558a635f65d0) at /home/mats/repos/timescaledb-2.0/tsl/src/bgw_policy/job.c:454
#11 0x00007f58d53a85c1 in job_execute (job=0x558a635f62e0) at /home/mats/repos/timescaledb-2.0/tsl/src/bgw_policy/job.c:530
#12 0x00007f58d5516d0a in ts_bgw_job_execute (job=0x558a635f62e0) at /home/mats/repos/timescaledb-2.0/src/bgw/job.c:647
#13 0x00007f58d55171a9 in ts_bgw_job_entrypoint (fcinfo=0x31a7) at /home/mats/repos/timescaledb-2.0/src/bgw/job.c:756
#14 0x0000558a61ff44d2 in StartBackgroundWorker () at bgworker.c:900
#15 0x0000558a62001d73 in do_start_bgworker (rw=<optimized out>) at postmaster.c:5850
#16 maybe_start_bgworkers () at postmaster.c:6075
#17 0x0000558a62002732 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5212
#18 <signal handler called>
#19 0x00007f58ded630da in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffee509f3e0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffee509f340) at ../sysdeps/unix/sysv/linux/select.c:41
#20 0x0000558a62002f8e in ServerLoop () at postmaster.c:1689
#21 0x0000558a62004bbc in PostmasterMain (argc=6, argv=<optimized out>) at postmaster.c:1398
#22 0x0000558a61d2b02d in main (argc=6, argv=0x558a634f8770) at main.c:228

NunoFilipeSantos · 2021-09-21T14:24:35Z

@hardikm10 can you please verify against 2.4.2?

fabriziomello · 2021-09-21T14:56:08Z

@NunoFilipeSantos it's not working yet... checked now and still failing. Assigning myself and will have a look.

Logs:

2021-09-21 11:51:01.777 -03 [165009] WARNING:  transaction left non-empty SPI stack
2021-09-21 11:51:01.777 -03 [165009] HINT:  Check for missing "SPI_finish" calls.
2021-09-21 11:51:01.777 -03 [165009] CONTEXT:  SQL statement "CALL refresh_continuous_aggregate('sensor_summary_hourly', null,null)"
	PL/pgSQL function refresh_via_job(integer,jsonb) line 1 at CALL
2021-09-21 11:51:02.525 -03 [164654] LOG:  background worker "User-Defined Action [1033]" (PID 165009) was terminated by signal 11: Segmentation fault
2021-09-21 11:51:02.525 -03 [164654] DETAIL:  Failed process was running: CALL public.refresh_via_job()
2021-09-21 11:51:02.525 -03 [164654] LOG:  terminating any other active server processes
2021-09-21 11:51:02.526 -03 [165006] WARNING:  terminating connection because of crash of another server process
2021-09-21 11:51:02.526 -03 [165006] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-09-21 11:51:02.526 -03 [165006] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-09-21 11:51:02.526 -03 [164659] WARNING:  terminating connection because of crash of another server process
2021-09-21 11:51:02.526 -03 [164659] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-09-21 11:51:02.526 -03 [164659] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-09-21 11:51:02.526 -03 [164654] LOG:  all server processes terminated; reinitializing
2021-09-21 11:51:02.673 -03 [165010] LOG:  database system was interrupted; last known up at 2021-09-21 11:50:46 -03
2021-09-21 11:51:02.728 -03 [165010] LOG:  database system was not properly shut down; automatic recovery in progress
2021-09-21 11:51:02.732 -03 [165010] LOG:  redo starts at 0/6C00138
2021-09-21 11:51:02.819 -03 [165010] LOG:  redo done at 0/7BFFFA8
2021-09-21 11:51:02.856 -03 [164654] LOG:  database system is ready to accept connections
2021-09-21 11:51:02.856 -03 [165016] LOG:  TimescaleDB background worker launcher connected to shared catalogs

Backtrace

timescaledb on  master [$?] 
➜ gdb -q -c /tmp/main5432/core.165009.sig11.1632235862s /home/fabrizio/pgsql/current/bin/postgres
Reading symbols from /home/fabrizio/pgsql/current/bin/postgres...
[New LWP 165009]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: User-Defined Action [1033]                            '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005578926b3339 in _SPI_execute_plan (plan=plan@entry=0x5578934cdcf0, paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0, 
    crosscheck_snapshot=crosscheck_snapshot@entry=0x0, read_only=read_only@entry=false, fire_triggers=fire_triggers@entry=true, 
    tcount=<optimized out>) at spi.c:2344
2344					if (_SPI_current->tuptable)
(gdb) bt
#0  0x00005578926b3339 in _SPI_execute_plan (plan=plan@entry=0x5578934cdcf0, paramLI=paramLI@entry=0x0, snapshot=snapshot@entry=0x0, 
    crosscheck_snapshot=crosscheck_snapshot@entry=0x0, read_only=read_only@entry=false, fire_triggers=fire_triggers@entry=true, 
    tcount=<optimized out>) at spi.c:2344
#1  0x00005578926b3cbf in SPI_execute_plan_with_paramlist (plan=0x5578934cdcf0, params=0x0, read_only=<optimized out>, 
    tcount=tcount@entry=0) at spi.c:576
#2  0x00007f465ade14de in exec_stmt_call (estate=0x7ffe4b6ec740, stmt=0x5578934e9310) at pl_exec.c:2330
#3  0x00007f465ade3c23 in exec_stmt (estate=estate@entry=0x7ffe4b6ec740, stmt=0x5578934e9310) at pl_exec.c:1989
#4  0x00007f465ade747e in exec_stmts (estate=0x7ffe4b6ec740, stmts=0x5578934e9850) at pl_exec.c:1944
#5  0x00007f465ade787e in exec_stmt_block (estate=estate@entry=0x7ffe4b6ec740, block=block@entry=0x5578934e98a8) at pl_exec.c:1885
#6  0x00007f465ade478b in exec_stmt (estate=estate@entry=0x7ffe4b6ec740, stmt=0x5578934e98a8) at pl_exec.c:1977
#7  0x00007f465ade7005 in plpgsql_exec_function (func=func@entry=0x557893430d10, fcinfo=fcinfo@entry=0x7ffe4b6eca40, 
    simple_eval_estate=simple_eval_estate@entry=0x0, simple_eval_resowner=simple_eval_resowner@entry=0x0, atomic=<optimized out>)
    at pl_exec.c:611
#8  0x00007f465adf27ee in plpgsql_call_handler (fcinfo=0x7ffe4b6eca40) at pl_handler.c:265
#9  0x000055789260c330 in ExecuteCallStmt (stmt=stmt@entry=0x557893430088, params=<optimized out>, atomic=atomic@entry=false, 
    dest=dest@entry=0x557892bf85a0 <donothingDR>) at functioncmds.c:2232
#10 0x00007f465ae8401d in job_execute_procedure (funcexpr=0x55789342fff8)
    at /d/github.com/fabriziomello/timescaledb/tsl/src/bgw_policy/job.c:751
#11 job_execute (job=<optimized out>) at /d/github.com/fabriziomello/timescaledb/tsl/src/bgw_policy/job.c:832
#12 0x00007f465b14a5cb in ts_bgw_job_execute (job=0x55789342faa0) at /d/github.com/fabriziomello/timescaledb/src/bgw/job.c:628
#13 0x00007f465b14a7b4 in ts_bgw_job_entrypoint (fcinfo=<optimized out>) at /d/github.com/fabriziomello/timescaledb/src/bgw/job.c:737
#14 0x000055789276ad45 in StartBackgroundWorker () at bgworker.c:890
#15 0x0000557892778bd3 in do_start_bgworker (rw=<optimized out>) at postmaster.c:5870
#16 maybe_start_bgworkers () at postmaster.c:6095
#17 0x0000557892779632 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5255
#18 <signal handler called>
#19 0x00007f46663850da in __GI___select (nfds=nfds@entry=7, readfds=readfds@entry=0x7ffe4b6edbc0, writefds=writefds@entry=0x0, 
    exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffe4b6edb20) at ../sysdeps/unix/sysv/linux/select.c:41
#20 0x0000557892779e6e in ServerLoop () at postmaster.c:1703
#21 0x000055789277bbc4 in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1412
#22 0x0000557892474593 in main (argc=3, argv=0x5578933d3c00) at main.c:210
(gdb)

NunoFilipeSantos · 2021-09-22T09:55:49Z

Thank you @fabriziomello . :)

Segmentation fault was ocurring when calling the procedure `refresh_continous_aggregate` from an user defined action (job). Fixed it by adding the `SPI_connect_ext/SPI_finish` during the execution because there are underlying SPI calls that was leading us to an invalid SPI state (nonexistent `_SPI_current` global). Fixes timescale#3145

mkindahl added the segfault Segmentation fault label Apr 23, 2021

mkindahl self-assigned this Apr 23, 2021

mkindahl added bug bgw The background worker subsystem, including the scheduler labels Apr 26, 2021

mkindahl removed their assignment May 4, 2021

NunoFilipeSantos added the continuous_aggregate label Aug 24, 2021

fabriziomello self-assigned this Sep 21, 2021

fabriziomello closed this as completed in b3380e8 Oct 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling refresh_continuous_aggregate under a User-Defined-Action causes segmentation fault #3145

Calling refresh_continuous_aggregate under a User-Defined-Action causes segmentation fault #3145

hardikm10 commented Apr 21, 2021

mkindahl commented Apr 26, 2021

NunoFilipeSantos commented Sep 21, 2021

fabriziomello commented Sep 21, 2021

NunoFilipeSantos commented Sep 22, 2021

Calling refresh_continuous_aggregate under a User-Defined-Action causes segmentation fault #3145

Calling refresh_continuous_aggregate under a User-Defined-Action causes segmentation fault #3145

Comments

hardikm10 commented Apr 21, 2021

mkindahl commented Apr 26, 2021

NunoFilipeSantos commented Sep 21, 2021

fabriziomello commented Sep 21, 2021

NunoFilipeSantos commented Sep 22, 2021