Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement adding uniqueness constraints to columns #53

Merged
merged 3 commits into from
Aug 18, 2023
Merged

Conversation

andrew-farries
Copy link
Collaborator

@andrew-farries andrew-farries commented Aug 15, 2023

Add support for adding uniqueness constraints to columns. Such a migration looks like this:

{
  "name": "15_set_columns_unique",
  "operations": [
    {
      "set_unique": {
        "name": "reviews_username_product_unique",
        "table": "reviews",
        "columns": [
          "username",
          "product"
        ]
      }
    }
  ]
}

This migration adds a unique constraint spanning the username and product columns in the reviews table.

  • On Start a unique index is created concurrently.
  • On Rollback the unique index is removed.
  • On Complete a unique constraint is added to the column using the index.

Creating a unique constraint directly requires a full table exclusive lock. By first creating a unique index concurrently and then adding a constraint using the index the need for the lock is avoided.

@exekias
Copy link
Member

exekias commented Aug 17, 2023

I've one question here, when will the constraint start being enforced? If I understand correctly, it would only be after the call to complete. Ideally, we offer the guarantee that each version of the schema fully respects it. Meaning that accessing through the new version enforces the unique constraint while accessing the old one doesn't.

If I'm not mistaken, it almost feels like we need to create a new column and copy all the data (optionally applying an up function), then enforce the constraint in the new column, but not in the old one. Does this make sense?

@exekias
Copy link
Member

exekias commented Aug 17, 2023

Another though, I go back and forth with doing this as its own op or having it as part of the alter column operation. I start to see that alter column may be better, but we can decide on this later.

@andrew-farries andrew-farries force-pushed the rename-column branch 2 times, most recently from f23568f to c712013 Compare August 18, 2023 05:45
Base automatically changed from rename-column to main August 18, 2023 05:49
@andrew-farries andrew-farries force-pushed the set-unique branch 2 times, most recently from 75bbe73 to 9d50a22 Compare August 18, 2023 07:25
@andrew-farries andrew-farries changed the base branch from main to mandatory-index-name August 18, 2023 07:28
Base automatically changed from mandatory-index-name to main August 18, 2023 07:55
Create a new table and then add a unique constraint on a pair of columns
in a separate migration.
@andrew-farries
Copy link
Collaborator Author

I've one question here, when will the constraint start being enforced? If I understand correctly, it would only be after the call to complete. Ideally, we offer the guarantee that each version of the schema fully respects it. Meaning that accessing through the new version enforces the unique constraint while accessing the old one doesn't.

Uniqueness is enforced after start, when the UNIQUE index is created:

https://github.com/xataio/pg-roll/blob/5d64ac47d65b6a365d0372d210674d2d6e98fcb8/pkg/migrations/op_set_unique.go#L23-L28

We have a test that ensures that inserts to both the old and new schema that violate the constraint will fail:

https://github.com/xataio/pg-roll/blob/5d64ac47d65b6a365d0372d210674d2d6e98fcb8/pkg/migrations/op_set_unique_test.go#L58-L75

The only thing that Complete does is convert the index into a constraint using the unique index. We don't create the constraint directly on Start to avoid taking an exclusive lock on the table.

@andrew-farries
Copy link
Collaborator Author

andrew-farries commented Aug 18, 2023

Another though, I go back and forth with doing this as its own op or having it as part of the alter column operation. I start to see that alter column may be better, but we can decide on this later.

Yes, I was thinking similarly. A single 'alter table' operation might be the best approach.

@exekias
Copy link
Member

exekias commented Aug 18, 2023

We talked about this offline, and the outcome was:

  • We will keep separate operations, and probably merge them later in a single alter column migration
  • This operation is ok as it is. We are breaking the promise of keeping the same constraints in the old schema (as null will be enforced after start), but:
    • We have ideas to improve this in the future if we find the need, like for instance allowing the user to define an up migration that works as a fallback for non-unique values
    • Since the migration will fail to start if there are repeated values, it may be fine for the clients to get the constraint enforced later

@andrew-farries andrew-farries merged commit a91efac into main Aug 18, 2023
3 checks passed
@andrew-farries andrew-farries deleted the set-unique branch August 18, 2023 11:24
andrew-farries added a commit that referenced this pull request Sep 22, 2023
…ions (#118)

When `pg-roll` makes a schema change, the contract with the user is that
it will leave the old version of schema unchanged.

When the operation to add a `UNIQUE` constraint was implemented
(#53), this contract was not
respected. The operation adds a unique index to the existing column,
changing the behaviour for users of the old schema.

This PR changes the operation so that it follows a similar pattern to
other operations that were implemented later:
* On `Start`:
  * Duplicate the column.
  * Add a `UNIQUE` index concurrently
* Create `up` and `down` triggers to copy values between the old and new
columns.
  * Backfill values from the old column into the new using `up` SQL
* On `Complete`
  * Create a unique constraint on the new column using the unique index.
  * Drop the old column
  * Rename the column to its old name.
  * Remove `up` and `down` triggers.

Writing correct `up` SQL for the operation is a little more difficult
than for other operations (eg set `NOT NULL`) as it is up to the user to
ensure uniqueness of values. The example migration in this PR appends a
random suffix to each value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants