New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance enhancements for bulk writes. #770

Merged
merged 23 commits into from Feb 4, 2018

Conversation

Projects
None yet
3 participants
@naushadh
Contributor

naushadh commented Jan 6, 2018

Motivation

Trying to make persistent capable of performing generic and/or efficient writes for various back-ends.

Reads are already efficient as it's based on a single call to the database. But many of the existing APIs for writes work on a single record/entity. With batch/list support we could make persistent more capable of supporting ETL type jobs, i.e., slap a conduit over a bulk update write function CL.chunk .| CL.mapM someDbBulkWrite .| CL.concat.

Summary of changes

  • Batching enhancements to reduce db round-trips.

    • Added getMany and repsertMany for batched get and repsert.
    • Added putMany with a default/slow implementation and native UPSERT for PGSQL and MYSQL.
    • Updated insertEntityMany to replace slow looped usage with batched execution.
  • DRYed up several util functions into Database.Persist.Sql.Util.

  • Upstreamed updatePersistValue, mkUpdateText, and commaSeparated from Database.Persist.MySQL.

  • De-duplicated updatePersistValue from various Database.Persist.Sql.Orphan.* modules.

  • Added default implementations for all new APIs to be fully backward compatible.

Failed efforts

  • Upstreaming insertManyOnDuplicateKeyUpdate from persistent-mysql. The API definition would become large enough to warrant it's own type before it's added as a field to SqlBackend.

  • Make LTS-6 happy without making LTS-2 mad re: Monoid import. Import list is near identical to master without a few additions yet there are failures. Hence the use of {-# OPTIONS_GHC -fno-warn-unused-imports #-} based on prior suggestion.

naushadh added some commits Nov 20, 2017

PersistStore batching enhacements to reduce db round-trips.
- Added `getMany` and `repsertMany` for batched `get` and `repsert`.
- Updated `insertEntityMany` to replace slow looped usage with batched execution.
DRYed up several util functions into `Database.Persist.Sql`.
- Upstreamed `updatePersistValue`, `mkUpdateText`, and `commaSeparated` from `Database.Persist.MySQL`.
- De-duplicated `updatePersistValue` from various `Database.Persist.Sql.Orphan.*` modules.
Added ability to `putMany` (insert or replace) records into DB.
- Updated SqlBackend to add ability to configure backend specific putMany.
- Added a default `putMany` so `PersistUnique` doesn't have to demand all backends to implement it.
- Updated all SqlBackends to implement `connPutManySql`.
- Added basic test to verify insertion works.
Added test for put and fixed unique diffing.
Uniqueness diffing should be done on the unique keys rather than on the whole record.
Test improvements and fixes.
- Removed the need for functor instance in `get`.
- Removed the need for conditional import on `<>`.
- Removed redundant `Map` include for NOSQL runs.
- Removed publicly exposing `defaultPutMany` as we only require it for SqlBackend instance definition.
- Updated `putMany` implementations to account for duplicates within the record set. also added test to verify "last" value takes precendence.
- Extended `mkUpdateText'` to allow injection/override of reference column. `upsert` in postgres fails with ambiguos column reference error.
@psibi

This comment has been minimized.

Member

psibi commented Jan 7, 2018

Is there a guage/criterion benchmark for before and after the patch ?

@MaxGabriel

This comment has been minimized.

Member

MaxGabriel commented Jan 7, 2018

Since the performance enhancements come from preventing additional network requests, would a criterion benchmark show that much?

-- | The _essence_ of a unique record.
-- useful for comaparing records in haskell land for uniqueness equality.
recordEssence :: PersistEntity record => record -> [PersistValue]
recordEssence r = concat $ map persistUniqueToValues $ persistUniqueKeys r

This comment has been minimized.

@MaxGabriel

MaxGabriel Jan 7, 2018

Member

Is there a better name for this? This seems like one of those function names where you'd always have to lookup the implementation to see what it does

This comment has been minimized.

@naushadh

naushadh Jan 8, 2018

Contributor

For sure, I can think of a few

  • persistUniqueKeyValues
  • persistValuesFromAllUniqueKeys
  • persistValuesAllUniqueKeys

Also, this function is entirely for internal use, it is not exposed via Database.Persist.Class.

@naushadh

This comment has been minimized.

Contributor

naushadh commented Jan 13, 2018

Test failure appears to be random.
Same commit on branch runs ok after rerun: https://travis-ci.org/naushadh/persistent/builds/328570723

@naushadh

This comment has been minimized.

Contributor

naushadh commented Jan 14, 2018

@psibi Benchmark is now available at: https://github.com/naushadh/persistent-examples/tree/etl/stream-write
TLDC: double digit fold performance gain using batching (new APIs) over looping (existing APIs).

benchmarking MyRecord/10
- time                 232.4 μs   (201.2 μs .. 281.1 μs)
+ time                 69.22 μs   (62.76 μs .. 80.15 μs)

benchmarking MyRecord/100
- time                 4.187 ms   (3.453 ms .. 6.545 ms)
+ time                 518.7 μs   (435.0 μs .. 654.4 μs)

benchmarking MyRecord/1000
- time                 47.51 ms   (35.15 ms .. NaN s)
+ time                 12.98 ms   (11.22 ms .. 20.11 ms)

benchmarking MyUniqueRecord/10
- time                 311.8 μs   (274.1 μs .. 372.5 μs)
+ time                 42.35 μs   (38.29 μs .. 49.32 μs)

benchmarking MyUniqueRecord/100
- time                 6.699 ms   (5.152 ms .. NaN s)
+ time                 210.2 μs   (184.5 μs .. 251.0 μs)

benchmarking MyUniqueRecord/1000
- time                 66.15 ms   (49.38 ms .. NaN s)
+ time                 3.981 ms   (3.169 ms .. 6.536 ms)
@psibi

This comment has been minimized.

Member

psibi commented Jan 14, 2018

@naushadh That looks amazing. Thanks!

putManySql :: EntityDef -> Int -> Text
putManySql entityDef' numRecords
| numRecords > 0 = q
| otherwise = error "putManySql: numRecords MUST be greater than 0!"

This comment has been minimized.

@MaxGabriel

MaxGabriel Jan 17, 2018

Member

What do you think about making this a no-op if == 0? On the one hand inserting zero records could be a bug, on the other hand you might have code that dynamically changes the number of records inserted, and having zero be a no-op could be useful for that case. Thoughts?

This comment has been minimized.

@naushadh

naushadh Jan 18, 2018

Contributor

I agree with the no-op strategy. The users of connPutManySql: putMany and defaultPutMany already short circuit with a return () when given 0 records. putManySql is never to be used unless there are records to be inserted (because it'd generate an invalid sql query otherwise).

naushadh added some commits Jan 18, 2018

Merge `master` into `etl2`.
Resolved conflict with 'persistent/Database/Persist/Sql/Orphan/PersistStore.hs' by accepting mine. This is because `parseEntityValues` already exposes rich error messaging the incoming change was implemeting.
Merged `master` into `etl2`.
Conflicts:
- persistent-postgresql/Database/Persist/Postgresql.hs
-
 persistent/Database/Persist/Sql/Orphan/PersistStore.hs
@naushadh

This comment has been minimized.

Contributor

naushadh commented Feb 4, 2018

Just following up on this. Anything additional needed of me?

@MaxGabriel

This comment has been minimized.

Member

MaxGabriel commented Feb 4, 2018

@naushadh Can you bump the cabal file for each package you've changed, and also update the corresponding changelog? Otherwise this is ready to merge 👍

@MaxGabriel MaxGabriel merged commit a4e95f1 into yesodweb:master Feb 4, 2018

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@MaxGabriel

This comment has been minimized.

Member

MaxGabriel commented Feb 4, 2018

Released as 2.8.1. Thanks for this @naushadh!

@naushadh

This comment has been minimized.

Contributor

naushadh commented Feb 4, 2018

Wicked! At your service yesodweb folks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment