New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Postgres' RETURNING capability to optimize `insertMany`. #407

Merged
merged 9 commits into from Jun 8, 2015

Conversation

Projects
None yet
3 participants
@MaxGabriel
Member

MaxGabriel commented Jun 2, 2015

Closes #365.

This PR uses Postgres' RETURNING capability to optimize bulk-inserts. Instead of inserting one-at-a-time, the rows can be inserted all at once and have their IDs returned. Based on my comment here, I don't think this is possible for SQLite/MySQL in a reliable way.

I believe test coverage is already pretty good on this, because several tests use insertMany, get back IDs, and then use those IDs to lookup the inserted rows.

Things I'm unsure about:

  • The different InsertSqlResult types. Is ISRManyKeys supposed to be used when a composite primary key is defined?
  • Version bumps. Is this considered a breaking change because it adds a new record field to public API?
@gregwebs

This comment has been minimized.

Show comment
Hide comment
@gregwebs

gregwebs Jun 2, 2015

Member

This should re-use functionality from Sql/Util.hs. In particular, this will fail at least for composite keys and should use dbIdColumns for the returning clause. The test case for this should be added, perhaps in CompositeTest or PrimaryTest.

Member

gregwebs commented Jun 2, 2015

This should re-use functionality from Sql/Util.hs. In particular, this will fail at least for composite keys and should use dbIdColumns for the returning clause. The test case for this should be added, perhaps in CompositeTest or PrimaryTest.

@MaxGabriel

This comment has been minimized.

Show comment
Hide comment
@MaxGabriel

MaxGabriel Jun 5, 2015

Member

Ok I added a failing test case for the composite primary key case, and then fixed that by adding a RawSql instance for Keys and using dbIdColumnsEsc from Database.Persist.Sql.Util. Probably I should add more tests for the RawSql instance, but is this generally correct otherwise?

Member

MaxGabriel commented Jun 5, 2015

Ok I added a failing test case for the composite primary key case, and then fixed that by adding a RawSql instance for Keys and using dbIdColumnsEsc from Database.Persist.Sql.Util. Probably I should add more tests for the RawSql instance, but is this generally correct otherwise?

@gregwebs

This comment has been minimized.

Show comment
Hide comment
@gregwebs

gregwebs Jun 6, 2015

Member

I actually haven't touched RawSql before, but it looks good now. Perhaps @snoyberg can look at the RawSql instance.

Member

gregwebs commented Jun 6, 2015

I actually haven't touched RawSql before, but it looks good now. Perhaps @snoyberg can look at the RawSql instance.

@gregwebs

This comment has been minimized.

Show comment
Hide comment
@gregwebs

gregwebs Jun 6, 2015

Member

One more thing, the documentation for insertMany_ and insertMany should be updated now to reflect which SQL databases have efficient implementations.

Member

gregwebs commented Jun 6, 2015

One more thing, the documentation for insertMany_ and insertMany should be updated now to reflect which SQL databases have efficient implementations.

rawSqlCols _ key = (length $ keyToValues key, [])
rawSqlColCountReason key = "The primary key is composed of "
++ (show $ length $ keyToValues key)
++ " columns"

This comment has been minimized.

@MaxGabriel

MaxGabriel Jun 7, 2015

Member

Is there a better way to write rawSqlColCountReason? Maybe a way to list out the components of the key?

@MaxGabriel

MaxGabriel Jun 7, 2015

Member

Is there a better way to write rawSqlColCountReason? Maybe a way to list out the components of the key?

@@ -58,6 +58,13 @@ instance PersistField a => RawSql (Single a) where
rawSqlProcessRow [pv] = Single <$> fromPersistValue pv
rawSqlProcessRow _ = Left $ pack "RawSql (Single a): wrong number of columns."
instance (PersistEntity a, PersistEntityBackend a ~ SqlBackend) => RawSql (Key a) where
rawSqlCols _ key = (length $ keyToValues key, [])

This comment has been minimized.

@MaxGabriel

MaxGabriel Jun 7, 2015

Member

Is it inefficient to use keyToValues just to determine the number of columns? Didn't come up with a way to get the column count otherwise.

@MaxGabriel

MaxGabriel Jun 7, 2015

Member

Is it inefficient to use keyToValues just to determine the number of columns? Didn't come up with a way to get the column count otherwise.

This comment has been minimized.

@gregwebs

gregwebs Jun 7, 2015

Member

Laziness may save us from any inefficiencies

@gregwebs

gregwebs Jun 7, 2015

Member

Laziness may save us from any inefficiencies

Update changelogs for #407
* `RawSql` instance for `Key`s
* Optimize `insertMany` for Postgres backend

[ci skip]
@gregwebs

This comment has been minimized.

Show comment
Hide comment
@gregwebs

gregwebs Jun 7, 2015

Member

@MaxGabriel The changes look good now, thanks for the great patch with the updated ChangeLog and documentation!

@snoyberg this bumps persistent to 2.2 because the SqlBackend gets a connInsertManySql field added. However, this is only really an API breakage for someone implementing a new SQL backend, so we could probably avoid the major version bump if we wanted to. What do you think?

Member

gregwebs commented Jun 7, 2015

@MaxGabriel The changes look good now, thanks for the great patch with the updated ChangeLog and documentation!

@snoyberg this bumps persistent to 2.2 because the SqlBackend gets a connInsertManySql field added. However, this is only really an API breakage for someone implementing a new SQL backend, so we could probably avoid the major version bump if we wanted to. What do you think?

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jun 8, 2015

Member

I'm OK skipping the major version bump, we typically do so for internal-only changes like this.

Member

snoyberg commented Jun 8, 2015

I'm OK skipping the major version bump, we typically do so for internal-only changes like this.

gregwebs added a commit that referenced this pull request Jun 8, 2015

Merge pull request #407 from yesodweb/postgresInsertManyReturning3
Use Postgres' RETURNING capability to optimize `insertMany`.

@gregwebs gregwebs merged commit e944857 into master Jun 8, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@snoyberg snoyberg removed the in progress label Jun 8, 2015

@gregwebs gregwebs deleted the postgresInsertManyReturning3 branch Jun 8, 2015

@gregwebs

This comment has been minimized.

Show comment
Hide comment
@gregwebs

gregwebs Jun 8, 2015

Member

hmm, I am having second thoughts on the minor version bump. If someone pegs their backend version, but not persistent, then re-installing dependencies can cause a breakage. If we do the major bump, then at least we can say persistent should have been constrained to a major version.

@snoyberg are you ok with making this a major bump?

Member

gregwebs commented Jun 8, 2015

hmm, I am having second thoughts on the minor version bump. If someone pegs their backend version, but not persistent, then re-installing dependencies can cause a breakage. If we do the major bump, then at least we can say persistent should have been constrained to a major version.

@snoyberg are you ok with making this a major bump?

@snoyberg

This comment has been minimized.

Show comment
Hide comment
@snoyberg

snoyberg Jun 8, 2015

Member

Yes, I'm OK with it. Just to throw out one other possibility: we can add upper bounds via Hackage revisions to prevent the currently-released backends from using the newer persistent. I slightly prefer that approach, but only slightly.

Member

snoyberg commented Jun 8, 2015

Yes, I'm OK with it. Just to throw out one other possibility: we can add upper bounds via Hackage revisions to prevent the currently-released backends from using the newer persistent. I slightly prefer that approach, but only slightly.

@gregwebs

This comment has been minimized.

Show comment
Hide comment
@gregwebs

gregwebs Jun 8, 2015

Member

I tagged and released them all as version 2.2. That is enough work for me without changing a bunch of revisions on hackage.

Member

gregwebs commented Jun 8, 2015

I tagged and released them all as version 2.2. That is enough work for me without changing a bunch of revisions on hackage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment