Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: time_bucket_gapfill grouped by int[] field causes database to go into recovery mode #5981

Closed
willsbit opened this issue Aug 17, 2023 · 1 comment · Fixed by #5991
Closed
Assignees
Labels

Comments

@willsbit
Copy link

willsbit commented Aug 17, 2023

What type of bug is this?

Crash

What subsystems and features are affected?

Gapfill, Query executor, Query planner

What happened?

When using the time_bucket_gapfill() function, grouping by an int[] type column causes a crash.
In the example script, simply replacing int_arr with array_to_string(int_arr, ',') avoids the issue.

Expected: the query should work even when grouping by the original array type column.

TimescaleDB version affected

2.10.3

PostgreSQL version used

15.3

What operating system did you use?

(Ubuntu 15.3-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit

What installation method did you use?

Docker

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

Explain output:
[
  {
    "Plan": {
      "Node Type": "Custom Scan",
      "Custom Plan Provider": "GapFill",
      "Parallel Aware": false,
      "Async Capable": false,
      "Plans": [
        {
          "Node Type": "Aggregate",
          "Strategy": "Sorted",
          "Partial Mode": "Simple",
          "Parent Relationship": "child",
          "Parallel Aware": false,
          "Async Capable": false,
          "Group Key": ["(time_bucket_gapfill(5, x.x, 1, 1000))", "'{1,2,3,4}'::integer[]"],
          "Plans": [
            {
              "Node Type": "Sort",
              "Parent Relationship": "Outer",
              "Parallel Aware": false,
              "Async Capable": false,
              "Sort Key": ["(time_bucket_gapfill(5, x.x, 1, 1000))"],
              "Plans": [
                {
                  "Node Type": "Function Scan",
                  "Parent Relationship": "Outer",
                  "Parallel Aware": false,
                  "Async Capable": false,
                  "Function Name": "generate_series",
                  "Alias": "x"
                }
              ]
            }
          ]
        }
      ]
    }
  }
]


Error: 'NoneType' object has no attribute 'pgconn'connection failed: FATAL:  the database system is in recovery mode

How can we reproduce the bug?

Minimum reproductible example I set up:


SELECT time_bucket_gapfill(5, ts, 1, 1000) as ts, int_arr, locf(last(value, ts))
FROM (
    SELECT ARRAY[1,2,3,4]::int[] as int_arr, x as ts, x+500000 as value
    FROM generate_series(1, 1000, 10) as x
    ) t
GROUP BY 1, 2
@willsbit willsbit added the bug label Aug 17, 2023
@mkindahl
Copy link
Contributor

@willsbit Thank you for the bug report. Trivial to reproduce. Stack trace after crash is

(gdb) bt
#0  0x0000564b02bd40d6 in array_eq (fcinfo=fcinfo@entry=0x7ffd289b78e0) at arrayfuncs.c:3644
#1  0x0000564b02cf7413 in DirectFunctionCall2Coll (func=0x564b02bd3e93 <array_eq>, collation=<optimized out>, arg1=<optimized out>, arg2=<optimized out>) at fmgr.c:809
#2  0x00007f08a42742a3 in gapfill_state_is_new_group (state=state@entry=0x564b04a066a0, slot=slot@entry=0x564b04a07730) at /home/mats/work/timescale/timescaledb+hacking/tsl/src/nodes/gapfill/gapfill_exec.c:1043
#3  0x00007f08a42755f7 in gapfill_exec (node=0x564b04a066a0) at /home/mats/work/timescale/timescaledb+hacking/tsl/src/nodes/gapfill/gapfill_exec.c:822
#4  0x0000564b029f2e4d in ExecCustomScan (pstate=0x564b04a066a0) at nodeCustom.c:115
#5  0x0000564b029d53ce in ExecProcNode (node=0x564b04a066a0) at ../../../src/include/executor/executor.h:259
#6  ExecutePlan (estate=estate@entry=0x564b04a06460, planstate=0x564b04a066a0, use_parallel_mode=<optimized out>, operation=operation@entry=CMD_SELECT, sendTuples=sendTuples@entry=true, numberTuples=numberTuples@entry=0, 
    direction=ForwardScanDirection, dest=0x564b04c24ce8, execute_once=true) at execMain.c:1636
#7  0x0000564b029d55a7 in standard_ExecutorRun (queryDesc=0x564b04bfa620, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at execMain.c:363
#8  0x0000564b029d5671 in ExecutorRun (queryDesc=queryDesc@entry=0x564b04bfa620, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=<optimized out>) at execMain.c:307
#9  0x0000564b02ba26fd in PortalRunSelect (portal=portal@entry=0x564b0491d230, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x564b04c24ce8) at pquery.c:924
#10 0x0000564b02ba41fe in PortalRun (portal=portal@entry=0x564b0491d230, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x564b04c24ce8, 
    altdest=altdest@entry=0x564b04c24ce8, qc=0x7ffd289b7cb0) at pquery.c:768
#11 0x0000564b02ba0068 in exec_simple_query (
    query_string=query_string@entry=0x564b048a9fe0 "SELECT time_bucket_gapfill(5, ts, 1, 1000) as ts, int_arr, locf(last(value, ts))\nFROM (\n    SELECT ARRAY[1,2,3,4]::int[] as int_arr, x as ts, x+500000 as value\n    FROM generate_series(1, 1000, 10) as"...) at postgres.c:1250
#12 0x0000564b02ba1ff4 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4593
#13 0x0000564b02af6610 in BackendRun (port=port@entry=0x564b048cf3e0) at postmaster.c:4511
#14 0x0000564b02af9794 in BackendStartup (port=port@entry=0x564b048cf3e0) at postmaster.c:4239
#15 0x0000564b02af99cf in ServerLoop () at postmaster.c:1806
#16 0x0000564b02afafa7 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x564b04811980) at postmaster.c:1478
#17 0x0000564b02a3b3a3 in main (argc=1, argv=0x564b04811980) at main.c:202

@mkindahl mkindahl self-assigned this Aug 18, 2023
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Aug 21, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the function.  Since `array_eq` uses `fcinfo->flinfo->fn_extra`
for caching, and `flinfo` is null, this causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the function.

Fixes timescale#5981
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Aug 21, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the function.  Since `array_eq` uses `fcinfo->flinfo->fn_extra`
for caching, and `flinfo` is null, this causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the function.

Fixes timescale#5981
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Aug 21, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the function.  Since `array_eq` uses `fcinfo->flinfo->fn_extra`
for caching, and `flinfo` is null, this causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the function.

Fixes timescale#5981
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Aug 21, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the function.  Since `array_eq` uses `fcinfo->flinfo->fn_extra`
for caching, and `flinfo` is null, this causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the function.

Fixes timescale#5981
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Aug 21, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the PostgreSQL function.  Since `array_eq` uses
`fcinfo->flinfo->fn_extra` for caching, and `flinfo` is null, this
causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the PostgreSQL function.

Fixes timescale#5981

Update .unreleased/bugfix_5991

Co-authored-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
Signed-off-by: Mats Kindahl <mats.kindahl@gmail.com>
mkindahl added a commit to mkindahl/timescaledb that referenced this issue Aug 22, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the PostgreSQL function.  Since `array_eq` uses
`fcinfo->flinfo->fn_extra` for caching, and `flinfo` is null, this
causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the PostgreSQL function.

Fixes timescale#5981
mkindahl added a commit that referenced this issue Aug 22, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the PostgreSQL function.  Since `array_eq` uses
`fcinfo->flinfo->fn_extra` for caching, and `flinfo` is null, this
causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the PostgreSQL function.

Fixes #5981
github-actions bot pushed a commit that referenced this issue Aug 22, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the PostgreSQL function.  Since `array_eq` uses
`fcinfo->flinfo->fn_extra` for caching, and `flinfo` is null, this
causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the PostgreSQL function.

Fixes #5981

(cherry picked from commit 3db6922)
mkindahl added a commit that referenced this issue Aug 24, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the PostgreSQL function.  Since `array_eq` uses
`fcinfo->flinfo->fn_extra` for caching, and `flinfo` is null, this
causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the PostgreSQL function.

Fixes #5981

(cherry picked from commit 3db6922)
timescale-automation pushed a commit that referenced this issue Aug 24, 2023
The equality comparison function is called using
`DirectFunctionCall2Coll`, which do not set the `fcinfo->flinfo` when
calling the PostgreSQL function.  Since `array_eq` uses
`fcinfo->flinfo->fn_extra` for caching, and `flinfo` is null, this
causes a crash.

Fix this issue by using `FunctionCall2Coll` instead, which sets
`fcinfo->flinfo` before calling the PostgreSQL function.

Fixes #5981

(cherry picked from commit 3db6922)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants