Skip to content
This repository has been archived by the owner on May 2, 2023. It is now read-only.

upsert performance improvement #9

Merged
merged 1 commit into from
Jan 21, 2015
Merged

Conversation

lucagrulla
Copy link
Contributor

The current approach for upsert is to delete existing rows that are going to be reimported using a join between the staging stable and the target table to identify which rows have to be deleted:

DELETE
FROM
    my_table USING my_table_staging
WHERE
    my_table.id=my_table_staging.id

This is not performing very well.
With a destination table of ~4GB and a ~60MB of import data the query takes ~255 seconds to run.

When we changed the delete query from a join to a sub-select:

DELETE
FROM
    my_table
WHERE
    id in (select id from my_table_staging)

with the same volume of data(~4GB destination table and ~60MB file) we reduced the query time to ~1.20s.
When trying to delete from a ~4GB staging table to a 4GB target table we observed a ~1.53s execution time.

…ables to find the rows to be deleted we use subqueries to select the ids of the rows we want to cancel
@pingles pingles merged commit b7c9a70 into uswitch:master Jan 21, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants