Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #36238 - drop all data owned by the DB user, not only tables #844

Merged
merged 1 commit into from Mar 28, 2023

Conversation

evgeni
Copy link
Member

@evgeni evgeni commented Mar 28, 2023

The old logic (introduced all the way back in [1]) took a list of all tables and executed a DROP … CASCADE on them. While this is sufficient for tables and the associated sequences, triggers etc [2][3], it will not delete objects that have no direct dependency on tables, like types or functions.

However, we might have DDLs that do create such objects and then after performing a database reset (thus assuming the DB is empty) the DDL will fail to re-run with an error like:

psycopg2.errors.DuplicateObject: type "pulp_evr_array_item" already exists

In a perferct world, those DDLs would be written in a way that they detect that a certain object already exists and replace it instead.
This is what the problematic pulp DDL [5] does for functions, but PostgreSQL doesn't support CREATE OR REPLACE for types [6].

Instead of trying to obtain a list of all tables and drop those, we can instruct PostgreSQL to drop all objects owned by a certain user [4]. This has the benefit that it will drop all types of objects, so also types and functions (but not databases, which is good as we cannot recreate those without superuser permissions).
The downside is that this will not delete things not owned by the user, but in our setups I would expect everything to be owned by the application user as that is the one we use for setup etc and if things would not be owned by the user, the user would not have been able to DROP it with the old code either.

[1] Katello/katello-installer#550
[2] https://www.postgresql.org/docs/12/sql-droptable.html
[3] https://www.postgresql.org/docs/12/ddl-depend.html
[4] https://www.postgresql.org/docs/12/sql-drop-owned.html
[5] https://github.com/pulp/pulp_rpm/blob/06ed3c2b214684b06186d32b6a7c4ff018efa2c1/pulp_rpm/app/migrations/0013_RAW_rpm_evr_extension.py
[6] https://www.postgresql.org/docs/12/sql-createtype.html

The old logic (introduced all the way back in [1]) took a list of all
tables and executed a `DROP … CASCADE` on them. While this is sufficient
for tables and the *associated* sequences, triggers etc [2][3], it will
not delete objects that have no direct dependency on tables, like types
or functions.

However, we might have DDLs that do create such objects and then after
performing a database reset (thus assuming the DB is empty) the DDL will
fail to re-run with an error like:

    psycopg2.errors.DuplicateObject: type "pulp_evr_array_item" already exists

In a perferct world, those DDLs would be written in a way that they
detect that a certain object already exists and replace it instead.
This is what the problematic pulp DDL [5] does for functions, but
PostgreSQL doesn't support `CREATE OR REPLACE` for types [6].

Instead of trying to obtain a list of all tables and drop those, we can
instruct PostgreSQL to drop all objects owned by a certain user [4].
This has the benefit that it will drop all types of objects, so also
types and functions (but not databases, which is good as we cannot
recreate those without superuser permissions).
The downside is that this will not delete things not owned by the user,
but in our setups I would expect everything to be owned by the
application user as that is the one we use for setup etc and if things
would not be owned by the user, the user would not have been able to
`DROP` it with the old code either.

[1] Katello/katello-installer#550
[2] https://www.postgresql.org/docs/12/sql-droptable.html
[3] https://www.postgresql.org/docs/12/ddl-depend.html
[4] https://www.postgresql.org/docs/12/sql-drop-owned.html
[5] https://github.com/pulp/pulp_rpm/blob/06ed3c2b214684b06186d32b6a7c4ff018efa2c1/pulp_rpm/app/migrations/0013_RAW_rpm_evr_extension.py
[6] https://www.postgresql.org/docs/12/sql-createtype.html
Copy link
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only wonder about remote DBs where a user may be shared. It's something we don't recommend, but on the other hand, users will find ways to break in unexpected ways.

We don't document --reset-data so I think it's fine and we don't need to document anything in terms of release notes, but I'm sure at some point I'll link back to this and say "hmm, yes, that's what I was afraid of".

@evgeni
Copy link
Member Author

evgeni commented Mar 28, 2023

The DROP happens in the context if the database you're connecting to. So unless people add stuff to our database, we're safe. And that problem already existed before (and was a really bad idea)

@ekohl
Copy link
Member

ekohl commented Mar 28, 2023

Ah, that addresses my issues. I was worried it was global.

@ekohl ekohl merged commit bb67be5 into develop Mar 28, 2023
1 check passed
@ekohl ekohl deleted the i36238 branch March 28, 2023 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants