Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform backfill operations in batches #127

Closed
andrew-farries opened this issue Sep 22, 2023 · 0 comments · Fixed by #137
Closed

Perform backfill operations in batches #127

andrew-farries opened this issue Sep 22, 2023 · 0 comments · Fixed by #137
Labels
enhancement New feature or request

Comments

@andrew-farries
Copy link
Collaborator

The current implementation of backfilling for those operations that create a new column is naive:

func backFill(ctx context.Context, conn *sql.DB, table, column string) error {
// touch rows without changing them in order to have the trigger fire
// and set the value using the `up` SQL.
// TODO: this should be done in batches in case of large tables.
_, err := conn.ExecContext(ctx, fmt.Sprintf("UPDATE %s SET %s = %s",
pq.QuoteIdentifier(table),
pq.QuoteIdentifier(column),
pq.QuoteIdentifier(column)))
return err
}

This operation should be done in batches to reduce lock contention.

@andrew-farries andrew-farries added the enhancement New feature or request label Sep 22, 2023
andrew-farries added a commit that referenced this issue Sep 26, 2023
Add a `primaryKey` field to each table in the internal schema store to
record the column(s) that make up a table's primary key.

This will be used when backfilling rows
(#127)

An example schema now looks like:

```json
{
  "name": "public",
  "tables": {
    "users": {
      "oid": "16412",
      "name": "users",
      "columns": {
        "id": {
          "name": "id",
          "type": "integer",
          "comment": null,
          "default": "nextval('users_id_seq'::regclass)",
          "nullable": false
        },
        "name": {
          "name": "name",
          "type": "text",
          "comment": null,
          "default": null,
          "nullable": true
        }
      },
      "comment": null,
      "indexes": {
        "users_pkey": {
          "name": "users_pkey"
        }
      },
      "primaryKey": [
        "id"
      ]
    }
  }
}
```
Where the `primaryKey` field is the new field.
andrew-farries added a commit that referenced this issue Sep 26, 2023
The previous approach to backfilling new columns with values from
existing columns applied one `UPDATE` statement to the whole table,
taking row locks on the entire table in one long-running transaction.

This PR changes backfills to work in batches, copying 1000 records at a
time, each in separate short transaction. This reduces contention for
row locks and ensures that long-running backfills don't lock out other
updates to the same table for the duration of the backfill.

Fixes #127
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant