Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete PostgreSQL schema when syncing from a user with privs on specific tables only #105

Closed
RostislavKonstantinov opened this issue Jun 24, 2021 · 16 comments

Comments

@RostislavKonstantinov
Copy link

I have PG->PG synchronization process. Some of my tables use varchar as a primary key. When I run synchronization process through the kitchen-sync I have no error but:

  • the process hangs for a long time and is not finished
  • rows data duplication
  • tables structure is different than source structure
  • in the logs I see the message: Clearing and reloading orchestra_vm, can't efficiently detect differences because it has no primary key and no other suitable keys.

Source table:

-- auto-generated definition
create table orchestra_vm
(
    id              varchar(100)             not null
        constraint orchestra_vm_pkey
            primary key,
    created_at      timestamp with time zone not null,
    updated_at      timestamp with time zone not null,
    provider        varchar(15)              not null,
    region          varchar(15)              not null,
    status          varchar(15)              not null,
    meta            jsonb                    not null,
    image_id        varchar(100)
        constraint orchestra_vm_image_id_82c7076c_fk_orchestra_image_id
            references orchestra_image
            deferrable initially deferred,
    organization_id varchar(100)             not null
        constraint orchestra_vm_organization_id_19aa5fa5_fk_organizat
            references organizations_organization
            deferrable initially deferred,
    created_by_id   integer
        constraint orchestra_vm_created_by_id_e896b646_fk_users_user_id
            references users_user
            deferrable initially deferred,
    component_id    varchar(100)
        constraint orchestra_vm_component_id_92c3e7bc_fk_components_component_id
            references components_component
            deferrable initially deferred
);

alter table orchestra_vm
    owner to cloudsqlsuperuser;

create index orchestra_vm_created_by_id_e896b646
    on orchestra_vm (created_by_id);

create index orchestra_vm_id_e28bbb9d_like
    on orchestra_vm (id);

create index orchestra_vm_image_id_82c7076c
    on orchestra_vm (image_id);

create index orchestra_vm_image_id_82c7076c_like
    on orchestra_vm (image_id);

create index orchestra_vm_organization_id_19aa5fa5
    on orchestra_vm (organization_id);

create index orchestra_vm_organization_id_19aa5fa5_like
    on orchestra_vm (organization_id);

create index orchestra_vm_component_id_92c3e7bc
    on orchestra_vm (component_id);

create index orchestra_vm_component_id_92c3e7bc_like
    on orchestra_vm (component_id);

Destination table (created by kitchen-sync):

-- auto-generated definition
create table orchestra_vm
(
    id              varchar(100)             not null,
    created_at      timestamp with time zone not null,
    updated_at      timestamp with time zone not null,
    provider        varchar(15)              not null,
    region          varchar(15)              not null,
    status          varchar(15)              not null,
    meta            jsonb                    not null,
    image_id        varchar(100),
    organization_id varchar(100)             not null,
    created_by_id   integer,
    component_id    varchar(100)
);

alter table orchestra_vm
    owner to warehouse;

create index orchestra_vm_component_id_92c3e7bc
    on orchestra_vm (component_id);

create index orchestra_vm_component_id_92c3e7bc_like
    on orchestra_vm (component_id);

create index orchestra_vm_created_by_id_e896b646
    on orchestra_vm (created_by_id);

create index orchestra_vm_id_e28bbb9d_like
    on orchestra_vm (id);

create index orchestra_vm_image_id_82c7076c
    on orchestra_vm (image_id);

create index orchestra_vm_image_id_82c7076c_like
    on orchestra_vm (image_id);

create index orchestra_vm_organization_id_19aa5fa5
    on orchestra_vm (organization_id);

create index orchestra_vm_organization_id_19aa5fa5_like
    on orchestra_vm (organization_id);

Logs:

-- orchestra_vm: columns match, primary key columns don't match, keys match 
CREATE TABLE "virtual_lab_db_prod"."orchestra_vm" ( 
 "id" character varying(100) NOT NULL,
 "created_at" timestamp with time zone NOT NULL, 
 "updated_at" timestamp with time zone NOT NULL, 
 "provider" character varying(15) NOT NULL, 
 "region" character varying(15) NOT NULL, 
 "status" character varying(15) NOT NULL, 
 "meta" jsonb NOT NULL, 
 "image_id" character varying(100), 
 "organization_id" character varying(100) NOT NULL, 
 "created_by_id" integer, 
 "component_id" character varying(100)) 

CREATE INDEX "orchestra_vm_component_id_92c3e7bc" ON "virtual_lab_db_prod"."orchestra_vm" ("component_id") 
CREATE INDEX "orchestra_vm_component_id_92c3e7bc_like" ON "virtual_lab_db_prod"."orchestra_vm" ("component_id") 
CREATE INDEX "orchestra_vm_created_by_id_e896b646" ON "virtual_lab_db_prod"."orchestra_vm" ("created_by_id") 
CREATE INDEX "orchestra_vm_id_e28bbb9d_like" ON "virtual_lab_db_prod"."orchestra_vm" ("id") 
CREATE INDEX "orchestra_vm_image_id_82c7076c" ON "virtual_lab_db_prod"."orchestra_vm" ("image_id")
CREATE INDEX "orchestra_vm_image_id_82c7076c_like" ON "virtual_lab_db_prod"."orchestra_vm" ("image_id") 
CREATE INDEX "orchestra_vm_organization_id_19aa5fa5" ON "virtual_lab_db_prod"."orchestra_vm" ("organization_id") 
CREATE INDEX "orchestra_vm_organization_id_19aa5fa5_like" ON "virtual_lab_db_prod"."orchestra_vm" ("organization_id") 


starting orchestra_vm
Clearing and reloading orchestra_vm, can't efficiently detect differences because it has no primary key and no other suitable keys. 
@RostislavKonstantinov
Copy link
Author

Looks like the same problem with PK serial fields.

Source:

-- auto-generated definition
create table auth_group
(
    id   serial       not null
        constraint auth_group_pkey
            primary key,
    name varchar(150) not null
        constraint auth_group_name_key
            unique
);

alter table auth_group
    owner to cloudsqlsuperuser;

create index auth_group_name_a6ea08ec_like
    on auth_group (name);

Target:

-- auto-generated definition
create table auth_group
(
    id   serial       not null,
    name varchar(150) not null
);

alter table auth_group
    owner to warehouse;

create unique index auth_group_name_key
    on auth_group (name);

create index auth_group_name_a6ea08ec_like
    on auth_group (name);

@willbryant
Copy link
Owner

What KS version, PG version, and OS?

@RostislavKonstantinov
Copy link
Author

KS v2.11, PG source v11 (google managed), PG target v12 (google managed), OS: debian:10 (docker)

@willbryant
Copy link
Owner

When I run the above statements in psql and then look at the output, only the source has a primary key; the target doesn't. So the two schemas don't match, and KS recreates the table to make them match.

You need to have the same schema at each end to use KS to sync data. If you don't, it changes the schema first to make them match. If the PK doesn't match, this involves recreating the table.

It only needs to do this once; after that, it syncs correctly.

eg. for the second examples you gave above:

$ psql source
psql (13.3)
Type "help" for help.

source=# \d auth_group
                                    Table "public.auth_group"
 Column |          Type          | Collation | Nullable |                Default                 
--------+------------------------+-----------+----------+----------------------------------------
 id     | integer                |           | not null | nextval('auth_group_id_seq'::regclass)
 name   | character varying(150) |           | not null | 
Indexes:
    "auth_group_pkey" PRIMARY KEY, btree (id)
    "auth_group_name_a6ea08ec_like" btree (name)
    "auth_group_name_key" UNIQUE CONSTRAINT, btree (name)

source=# 
\q
$ psql target
psql (13.3)
Type "help" for help.

target=# \d auth_group
                                    Table "public.auth_group"
 Column |          Type          | Collation | Nullable |                Default                 
--------+------------------------+-----------+----------+----------------------------------------
 id     | integer                |           | not null | nextval('auth_group_id_seq'::regclass)
 name   | character varying(150) |           | not null | 
Indexes:
    "auth_group_name_a6ea08ec_like" btree (name)
    "auth_group_name_key" UNIQUE, btree (name)

$ ./ks --from postgresql://will@localhost/source --to postgresql://will@localhost/target
Kitchen Sync
The database schema doesn't match.  Use the --alter option if you would like to automatically apply the following schema changes:

DROP SEQUENCE IF EXISTS "public"."auth_group_id_seq";
CREATE SEQUENCE "public"."auth_group_id_seq";
CREATE TABLE "public"."auth_group" (
  "id" integer NOT NULL DEFAULT nextval('"public"."auth_group_id_seq"'::regclass),
  "name" character varying(150) NOT NULL,
  PRIMARY KEY("id"));
CREATE UNIQUE INDEX "auth_group_name_key" ON "public"."auth_group" ("name");
CREATE INDEX "auth_group_name_a6ea08ec_like" ON "public"."auth_group" ("name");
ALTER SEQUENCE "public"."auth_group_id_seq" OWNED BY "public"."auth_group"."id";

Database schema needs migration.
Kitchen Syncing failed.
$ ./ks --from postgresql://will@localhost/source --to postgresql://will@localhost/target --alter
Kitchen Sync
Finished Kitchen Syncing.
$ ./ks --from postgresql://will@localhost/source --to postgresql://will@localhost/target
Kitchen Sync
Finished Kitchen Syncing.

@willbryant
Copy link
Owner

Are you seeing different behavior to the above with Google Cloud SQL?

@willbryant
Copy link
Owner

Wait, I get this for the first example - note that my output does have a PRIMARY KEY, whereas yours doesn't.

My source table doesn't have the other constraints (because I don't have those tables), though.

$ ./ks --from postgresql://will@localhost/source --to postgresql://will@localhost/target
Kitchen Sync
The database schema doesn't match.  Use the --alter option if you would like to automatically apply the following schema changes:

CREATE TABLE "public"."orchestra_vm" (
  "id" character varying(100) NOT NULL,
  "created_at" timestamp with time zone NOT NULL,
  "updated_at" timestamp with time zone NOT NULL,
  "provider" character varying(15) NOT NULL,
  "region" character varying(15) NOT NULL,
  "status" character varying(15) NOT NULL,
  "meta" jsonb NOT NULL,
  "image_id" character varying(100),
  "organization_id" character varying(100) NOT NULL,
  "created_by_id" integer,
  "component_id" character varying(100),
  PRIMARY KEY("id"));
CREATE INDEX "orchestra_vm_component_id_92c3e7bc" ON "public"."orchestra_vm" ("component_id");
CREATE INDEX "orchestra_vm_component_id_92c3e7bc_like" ON "public"."orchestra_vm" ("component_id");
CREATE INDEX "orchestra_vm_created_by_id_e896b646" ON "public"."orchestra_vm" ("created_by_id");
CREATE INDEX "orchestra_vm_id_e28bbb9d_like" ON "public"."orchestra_vm" ("id");
CREATE INDEX "orchestra_vm_image_id_82c7076c" ON "public"."orchestra_vm" ("image_id");
CREATE INDEX "orchestra_vm_image_id_82c7076c_like" ON "public"."orchestra_vm" ("image_id");
CREATE INDEX "orchestra_vm_organization_id_19aa5fa5" ON "public"."orchestra_vm" ("organization_id");
CREATE INDEX "orchestra_vm_organization_id_19aa5fa5_like" ON "public"."orchestra_vm" ("organization_id");

@willbryant
Copy link
Owner

Added the other tables (just dummy tables with the id column) so that I can create the other constraints. It still works OK locally (pg 13.3). I guess I'll try and create some Cloud SQL instances.

-- auto-generated definition
create table orchestra_image
(
    id              varchar(100)             not null
        constraint orchestra_image_pkey
            primary key
);

create table organizations_organization
(
    id              varchar(100)             not null
        constraint organizations_organization_pkey
            primary key
);

create table users_user
(
    id              int                      not null
        constraint users_user_pkey
            primary key
);

create table components_component
(
    id              varchar(100)             not null
        constraint components_component_pkey
            primary key
);

create table orchestra_vm
(
    id              varchar(100)             not null
        constraint orchestra_vm_pkey
            primary key,
    created_at      timestamp with time zone not null,
    updated_at      timestamp with time zone not null,
    provider        varchar(15)              not null,
    region          varchar(15)              not null,
    status          varchar(15)              not null,
    meta            jsonb                    not null,
    image_id        varchar(100)
        constraint orchestra_vm_image_id_82c7076c_fk_orchestra_image_id
            references orchestra_image
            deferrable initially deferred,
    organization_id varchar(100)             not null
        constraint orchestra_vm_organization_id_19aa5fa5_fk_organizat
            references organizations_organization
            deferrable initially deferred,
    created_by_id   integer
        constraint orchestra_vm_created_by_id_e896b646_fk_users_user_id
            references users_user
            deferrable initially deferred,
    component_id    varchar(100)
        constraint orchestra_vm_component_id_92c3e7bc_fk_components_component_id
            references components_component
            deferrable initially deferred
);

alter table orchestra_vm
    owner to cloudsqlsuperuser;

create index orchestra_vm_created_by_id_e896b646
    on orchestra_vm (created_by_id);

create index orchestra_vm_id_e28bbb9d_like
    on orchestra_vm (id);

create index orchestra_vm_image_id_82c7076c
    on orchestra_vm (image_id);

create index orchestra_vm_image_id_82c7076c_like
    on orchestra_vm (image_id);

create index orchestra_vm_organization_id_19aa5fa5
    on orchestra_vm (organization_id);

create index orchestra_vm_organization_id_19aa5fa5_like
    on orchestra_vm (organization_id);

create index orchestra_vm_component_id_92c3e7bc
    on orchestra_vm (component_id);

create index orchestra_vm_component_id_92c3e7bc_like
    on orchestra_vm (component_id);```

@willbryant
Copy link
Owner

That works fine for me too, so I can't reproduce the issue @RostislavKonstantinov :|. Created these two instances:

Screen Shot 2021-06-28 at 21 40 21

Pasted the above SQL into the first, then ran:

$ ./ks --from postgresql://postgres:<password>@34.135.188.0/ --to postgresql://postgres:<password>@35.225.237.255/
Kitchen Sync
The database schema doesn't match.  Use the --alter option if you would like to automatically apply the following schema changes:

CREATE TABLE "public"."components_component" (
  "id" character varying(100) NOT NULL,
  PRIMARY KEY("id"));
CREATE TABLE "public"."orchestra_image" (
  "id" character varying(100) NOT NULL,
  PRIMARY KEY("id"));
CREATE TABLE "public"."orchestra_vm" (
  "id" character varying(100) NOT NULL,
  "created_at" timestamp with time zone NOT NULL,
  "updated_at" timestamp with time zone NOT NULL,
  "provider" character varying(15) NOT NULL,
  "region" character varying(15) NOT NULL,
  "status" character varying(15) NOT NULL,
  "meta" jsonb NOT NULL,
  "image_id" character varying(100),
  "organization_id" character varying(100) NOT NULL,
  "created_by_id" integer,
  "component_id" character varying(100),
  PRIMARY KEY("id"));
CREATE INDEX "orchestra_vm_component_id_92c3e7bc" ON "public"."orchestra_vm" ("component_id");
CREATE INDEX "orchestra_vm_component_id_92c3e7bc_like" ON "public"."orchestra_vm" ("component_id");
CREATE INDEX "orchestra_vm_created_by_id_e896b646" ON "public"."orchestra_vm" ("created_by_id");
CREATE INDEX "orchestra_vm_id_e28bbb9d_like" ON "public"."orchestra_vm" ("id");
CREATE INDEX "orchestra_vm_image_id_82c7076c" ON "public"."orchestra_vm" ("image_id");
CREATE INDEX "orchestra_vm_image_id_82c7076c_like" ON "public"."orchestra_vm" ("image_id");
CREATE INDEX "orchestra_vm_organization_id_19aa5fa5" ON "public"."orchestra_vm" ("organization_id");
CREATE INDEX "orchestra_vm_organization_id_19aa5fa5_like" ON "public"."orchestra_vm" ("organization_id");
CREATE TABLE "public"."organizations_organization" (
  "id" character varying(100) NOT NULL,
  PRIMARY KEY("id"));
CREATE TABLE "public"."users_user" (
  "id" integer NOT NULL,
  PRIMARY KEY("id"));

Database schema needs migration.
Kitchen Syncing failed.
$ ./ks --from postgresql://postgres:<password>@34.135.188.0/ --to postgresql://postgres:<password>@35.225.237.255/ --alter
Kitchen Sync
Finished Kitchen Syncing.
$ ./ks --from postgresql://postgres:<password>@34.135.188.0/ --to postgresql://postgres:<password>@35.225.237.255/
Kitchen Sync
Finished Kitchen Syncing.
$ ./ks --from postgresql://postgres:<password>@34.135.188.0/ --to postgresql://postgres:<password>@35.225.237.255/ --verbose
Kitchen Sync
starting components_component
finished components_component in 1s using 0 hash commands and 0 rows commands changing 0 rows
starting orchestra_image
finished orchestra_image in 1s using 0 hash commands and 0 rows commands changing 0 rows
starting orchestra_vm
finished orchestra_vm in 2s using 0 hash commands and 0 rows commands changing 0 rows
starting organizations_organization
finished organizations_organization in 1s using 0 hash commands and 0 rows commands changing 0 rows
starting users_user
finished users_user in 1s using 0 hash commands and 0 rows commands changing 0 rows
Finished Kitchen Syncing.

@RostislavKonstantinov
Copy link
Author

Hi @willbryant. Thank you for your investigation. I didn't have time today for an answer, sorry.
Could you please check when the table with varchar PK has more than 10000 rows? Thanks!

@RostislavKonstantinov
Copy link
Author

Also, I will try collecting more information for you, it needs time.

@RostislavKonstantinov
Copy link
Author

@willbryant I understood that I used read-only user in source DB and it was a problem. When I changed the user everything is working fine.

Outcome: wrong permissions for user in the source-db.

Feel free to close the issue and thank you for your time and responsiveness!

As a little improvement, KS could have been informing about not enough permissions.)

@willbryant
Copy link
Owner

Ah! That's very interesting. I don't know why that happens. Maybe it does not have enough permissions to read some of the schema information tables, so it misses some schema objects like keys?

How do I create a read-only user to match yours?

@RostislavKonstantinov
Copy link
Author

RostislavKonstantinov commented Jun 28, 2021

My user have only SELECT privelegies for all tables in source schema.

ace=> SELECT table_catalog, table_schema, table_name, privilege_type
FROM   information_schema.table_privileges
WHERE  grantee = 'readonly_user';
 table_catalog | table_schema |         table_name          | privilege_type
---------------+--------------+-----------------------------+----------------
 ace           | public       | ...           | SELECT
 ace           | public       | ...   | SELECT
 ace           | public       | ...          | SELECT
 ace           | public       | orchestra_vm                | SELECT
......
(38 rows)
ace=> grant select on some_table to readonly_user;

@RostislavKonstantinov
Copy link
Author

Looks like this user was created mannualy from psql (by my colleague) and was granted only to existing tables in source schema.

@RostislavKonstantinov
Copy link
Author

https://cloud.google.com/sql/docs/postgres/create-manage-users#creating
After that, change role/permissions manually in the psql client.

@willbryant willbryant changed the title PG PK varchar is not recognized Incomplete schema syncing from a user with privs on specific tables only Jun 29, 2021
@willbryant willbryant changed the title Incomplete schema syncing from a user with privs on specific tables only Incomplete PostgreSQL schema when syncing from a user with privs on specific tables only Jun 29, 2021
@willbryant
Copy link
Owner

I've never been able to figure out how to reproduce this one. If I revoke default permissions eg. revoke all on pg_catalog.pg_index from PUBLIC; then KS hits proper permission errors like ERROR: permission denied for table pg_index rather than not "seeing" the indexes for the table.

It must be possible somehow, I'm just not sure how. Perhaps there's something special about the Google Cloud SQL permission model, but nothing sticks out to me in the docs.

No-one else has run into this in the meantime, so I'll close it for now as suggested. If anyone has exact steps to set up the situation, definitely happy to reopen and work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants