Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timescale/Postgres Grants Issue with Extraneous relations #4936

Closed
udesaiitrs opened this issue Nov 8, 2022 · 15 comments
Closed

Timescale/Postgres Grants Issue with Extraneous relations #4936

udesaiitrs opened this issue Nov 8, 2022 · 15 comments

Comments

@udesaiitrs
Copy link

udesaiitrs commented Nov 8, 2022

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Other

What happened?

We are running into a weird permissions error that possibly seems to be related to timescale more than postgres, but its not clear what exactly is causing the issue. We run a script which grants the permissions below to a user, and for some reason, there are a many extraneous tables or relations that we have not created. When GRANT commands are run, does timescale do any extra permission changes? We only have 10 tables created in the public schema, and 6 of those are hypertables.

TimescaleDB version affected

2.7.2

PostgreSQL version used

14

What operating system did you use?

docker image: timescale-ha-base:pg14.5-ts2.7.2-p0-r0

What installation method did you use?

Docker, Other

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

postgres@timescale-0:~$ psql -q -v ON_ERROR_STOP=1 -d mydatabase <<-EOF
      set client_min_messages=DEBUG;
      REVOKE ALL ON DATABASE mydatabase FROM PUBLIC;
      GRANT CONNECT ON DATABASE mydatabase TO sinkd;
      GRANT USAGE ON SCHEMA public TO sinkd;
      GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO sinkd;
> EOF
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.@" does not exist
DEBUG:  relation "public.@" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.8_19?]?" does not exist
DEBUG:  relation "public.8_19?]?" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.@" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public.8_19?]?" does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  rehashing catalog cache id 50 for pg_class; 257 tups, 128 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 257 tups, 128 buckets
DEBUG:  rehashing catalog cache id 50 for pg_class; 513 tups, 256 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 513 tups, 256 buckets
DEBUG:  rehashing catalog cache id 50 for pg_class; 1025 tups, 512 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 1025 tups, 512 buckets
DEBUG:  rehashing catalog cache id 50 for pg_class; 2049 tups, 1024 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 2049 tups, 1024 buckets
DEBUG:  relation "public." does not exist
DEBUG:  relation "public." does not exist
DEBUG:  rehashing catalog cache id 50 for pg_class; 4097 tups, 2048 buckets
DEBUG:  rehashing catalog cache id 51 for pg_class; 4097 tups, 2048 buckets
ERROR:  relation "public." does not exist

We also saw this error once when running the same job, but then the error doesn’t happen on the next run:

ERROR:  relation "public.k__compressed_hypertable_20_tid__ts" does not exist

How can we reproduce the bug?

Not sure if its reproduce-able on demand, but we have not cleared this environment and can run further tests if needed. This is only happening in one of our environments, and has not happened prior to this occurrence.
@udesaiitrs udesaiitrs added the bug label Nov 8, 2022
@udesaiitrs udesaiitrs changed the title [Bug]: <Title> Timescale/Postgres Grants Issue with Extraneous relations Nov 8, 2022
@konskov
Copy link
Contributor

konskov commented Nov 8, 2022

Hi @udesaiitrs , thank you for reaching out!

I also tried the steps above in a database with timescaledb 2.7.2 and pg14, where I created some tables and hypertables in the public schema, everything works as expected.

Might it be possible for you to provide answers to a few questions that will hopefully help debug this?

  • You mention running this job several times. Do you still see the debug messages (relation does not exist) each time or was it only the first time?
  • You mention several environments. Are these identical?
  • Did you run VACUUM?
  • Would it be possible to run each command above sequentially to determine when these messages are printed? Meaning, after which command’s execution?

Thank you!

@udesaiitrs
Copy link
Author

udesaiitrs commented Nov 8, 2022

Hi @konskov , thanks for the quick reply. Yes we see this happening each time the job runs.

We do have several environments, but they are not perfectly identical. Some are running on AWS, some are on-prem. Some are with > 1 HA nodes, but this one is with a single HA node.

We do run VACUUM, and have set the following default settings:

        autovacuum_analyze_scale_factor: 0.02
        autovacuum_max_workers: 10
        autovacuum_naptime: 120s
        autovacuum_vacuum_cost_limit: 500
        autovacuum_vacuum_scale_factor: 0.2
        log_autovacuum_min_duration: 1000

We pinpointed that the error occurs after this command:
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO sinkd;

So it does seem as though there are some phantom tables or relations created? But we cannot seem to find where these are stored as records so that when the GRANT command runs, they are referenced.

@udesaiitrs
Copy link
Author

udesaiitrs commented Nov 8, 2022

The resulting error is also not consistent, this last time I ran the job, the error that was shown was:
ERROR: relation "public.+2022-09-17T03:00:00Z" does not exist

One other point to consider is the command succeeds if run interactively from psql, but not if the command comes from stdin exactly like written above.

@konskov
Copy link
Contributor

konskov commented Nov 9, 2022

One other point to consider is the command succeeds if run interactively from psql, but not if the command comes from stdin exactly like written above

Thank you for pointing that out, useful to know.

A few more things that would help us investigate this:

  • could you also add \set VERBOSITY verbose at the beginning of the command? It would allow us to see the specific line causing the error.
  • Are you using compressed tables?
    Actually if you could share your schema and compression settings that would be very helpful as it would bring us closer to some reproduction steps.
  • We would like to confirm that permissions are consistent for hypertables and their chunks. A query you can use for that purpose is the following:
WITH
    hypertable_chunks AS (
        SELECT format('%1$I.%2$I', ch.schema_name, ch.table_name)::regclass AS chunk
             , format('%1$I.%2$I', ht.schema_name, ht.table_name)::regclass AS hypertable
	  FROM _timescaledb_catalog.chunk ch
	  JOIN _timescaledb_catalog.hypertable ht ON ch.hypertable_id = ht.id
    ),
    perms AS (
        SELECT format('%1$I.%2$I', table_schema, table_name)::regclass AS relation
	     , grantee
	     , array_agg(privilege_type order by privilege_type) AS grants
	  FROM information_schema.role_table_grants
	GROUP BY relation, grantee
    )
SELECT hypertable
     , chunk
     , ht.grantee
     , ht.grants AS hypertable_grants
     , ch.grants AS chunk_grants
FROM hypertable_chunks
     JOIN perms AS ht ON ht.relation = hypertable
     JOIN perms AS ch ON ch.relation = chunk AND ht.grantee = ch.grantee
WHERE ht.grants != ch.grants;

@udesaiitrs
Copy link
Author

Adding the verbose setting shows this:

postgres@timescale-0:~$ psql -q -v ON_ERROR_STOP=1 -d obcerv <<-EOF
-- sinkd
\set VERBOSITY verbose
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO sinkd;
> EOF
ERROR: 42P01: relation "public." does not exist
LOCATION: RangeVarGetRelidExtended, namespace.c:428

We are using compressed hypertables, yes. Unfortunately we can't share the full schema. I can provide more details without the full schema if that will help, just let me know what details to include.

Running that query you provided returns no results, which I believe is a good thing if you can confirm?

@konskov
Copy link
Contributor

konskov commented Nov 10, 2022

Yes, that is good, permissions are consistent on hypertables and their chunks.

The function RangeVarGetRelidExtended is not triggered by TimescaleDB code when executing GRANT commands. So it is probably called by some other extension... Do you have other extensions installed, and if so, can you check that you are still able to reproduce this issue with only TimescaleDB installed?

@udesaiitrs
Copy link
Author

We do have other extensions installed:

Screen Shot 2022-11-10 at 11 17 53 AM

We cannot reproduce this issue with only Timescaledb installed, or actually in other environments with the exact same schema/permissions. This is leading us to believe that this environment is corrupt, but we'd like to confirm all other possibilities first.

@konskov
Copy link
Contributor

konskov commented Nov 11, 2022

hi @udesaiitrs, given the information you have shared and that we do not trigger this from our code, it seems unlikely that this is a TimescaleDB issue... If you are able to get a stacktrace for this error then it would help determine which functions (and from what extensions) are involved

@udesaiitrs
Copy link
Author

udesaiitrs commented Nov 14, 2022

@konskov Understood thanks, do you know the easiest way to get a stacktrace for PG or increase the logging to show more details on the error?

@konskov
Copy link
Contributor

konskov commented Nov 15, 2022

To get a stacktrace using gdb, you can attach to the pid that is running the statements above and set a breakpoint at errfinish, and see the trace when the breakpoint is hit.

@konskov
Copy link
Contributor

konskov commented Nov 15, 2022

Actually in this case the location of the error is already known after doing \set verbosity verbose so instead of setting the breakpoint on errfinish, you can set it on RangeVarGetRelidExtended too.
You can also increase the log detail by setting for example log_min_messages = debug1 but that might make it harder to see what the relevant logs are since it is probably going to increase the log lines significantly

@mrkwtz
Copy link

mrkwtz commented Dec 8, 2022

We experience the same issue after upgrading from version 2.5.x to 2.8.1 of the TimescaleDB extension. We got four clusters running (2x staging, 2x prod) but only three (1x staging, 2x prod) are throwing these errors. We use Vault to create ephemeral roles for our developers and services to access the database. Vault additionally executes a GRANT ON SCHEMA statement, to give the role the correct permissions, that's when the error occurs.

2022-12-08 16:28:19 UTC [1383]: [6392106d.567-15] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.\" does not exist
2022-12-08 16:28:07 UTC [1383]: [6392106d.567-12] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.�D\" does not exist
2022-12-08 16:27:55 UTC [1383]: [6392106d.567-9] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.\u0002@\" does not exist
2022-12-08 16:27:29 UTC [1383]: [6392106d.567-3] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.30-85f8-b2641a6737eb_1000ms\" does not exist
2022-12-08 16:26:53 UTC [1318]: [6392102f.526-8] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.73_chunk\" does not exist
2022-12-08 16:26:42 UTC [1318]: [6392102f.526-5] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.-eb15-45d1-9637-456cfb377b17_60000ms\" does not exist
2022-12-08 16:26:09 UTC [1150]: [63920f44.47e-79] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.\u0002\" does not exist
2022-12-08 16:25:48 UTC [1150]: [63920f44.47e-74] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.�D\" does not exist
2022-12-08 16:25:43 UTC [1150]: [63920f44.47e-72] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.�D\" does not exist
2022-12-08 16:25:34 UTC [1150]: [63920f44.47e-68] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data._hyper_301_330545_chunk\" does not exist
2022-12-08 16:25:29 UTC [1150]: [63920f44.47e-66] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data._hyper_233_394577_chunk\" does not exist
2022-12-08 16:25:24 UTC [1150]: [63920f44.47e-64] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.\u0002\" does not exist
2022-12-08 16:22:51 UTC [1150]: [63920f44.47e-9] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.pg_toast_8353112\" does not exist
2022-12-08 16:22:45 UTC [1150]: [63920f44.47e-7] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data._hyper_378_777643_chunk\" does not exist
2022-12-08 16:22:38 UTC [1150]: [63920f44.47e-5] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.\u000f\" does not exist
2022-12-08 16:22:33 UTC [1150]: [63920f44.47e-3] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"data.785934_785933_13-14-373-462_100ms_pkey\" does not exist
2022-12-08 16:22:12 UTC [998]: [63920e5a.3e6-355] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"config.\" does not exist
2022-12-08 16:22:10 UTC [998]: [63920e5a.3e6-351] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"config.35_chunk\" does not exist
2022-12-08 16:22:06 UTC [998]: [63920e5a.3e6-339] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"config.@�=\" does not exist
2022-12-08 16:22:05 UTC [998]: [63920e5a.3e6-337] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"config..�=\" does not exist
2022-12-08 16:22:02 UTC [998]: [63920e5a.3e6-327] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"config.x�d\" does not exist
2022-12-08 16:21:57 UTC [998]: [63920e5a.3e6-317] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"config.t�F\" does not exist
2022-12-08 16:21:48 UTC [998]: [63920e5a.3e6-303] vault-ROLE@DATABASE,app=vault [42P01] ERROR:  relation \"config._hyper_378_467511_chunk\" does not exist

We observed this behavior immediately after the update extension command. TimescaleDB and plpgsql are also our only extensions enabled on that database.

DATABASE=# \dx
                                      List of installed extensions
    Name     | Version |   Schema   |                            Description                            
-------------+---------+------------+-------------------------------------------------------------------
 plpgsql     | 1.0     | pg_catalog | PL/pgSQL procedural language
 timescaledb | 2.8.1   | public     | Enables scalable inserts and complex queries for time-series data
(2 rows)

It happens sporadically but often. It's bad because it prevents our services at startup from connecting to their databases. Or at least it takes a long time until Vault can create a role and issue the GRANT statement without an error.

We're running this image timescale/timescaledb-ha:pg13-ts2.8-latest on a Kubernetes cluster in EKS, I don't know if that info is helpful.

@jnidzwetzki
Copy link
Contributor

Hello @mrkwtz,

Thank you for sharing the log file. I have tried to reproduce the issue in my local environment, but so far, I was not able to reproduce it.

You mention that a GRANT ON SCHEMA statement is executed. Can you see in the logfile what the full statement is?

Would it be possible for you to provide the schema of the database (the CREATE table / hypertable / schema statements)? This would help me to create a local environment similar to yours.

@github-actions
Copy link

github-actions bot commented Feb 8, 2023

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days.

Thank you!

@github-actions
Copy link

Dear Author,

We are closing this issue due to lack of activity. Feel free to add a comment to this issue if you can provide more information and we will re-open it.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants