Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

large insert using persistent-sqlite has 2.5x performance penalty vs sqlite #1441

Closed
joeyh opened this issue Nov 18, 2022 · 3 comments · Fixed by #1449
Closed

large insert using persistent-sqlite has 2.5x performance penalty vs sqlite #1441

joeyh opened this issue Nov 18, 2022 · 3 comments · Fixed by #1449

Comments

@joeyh
Copy link
Contributor

joeyh commented Nov 18, 2022

I'm using persistent-sqlite to populate a table with 50,000 uuids, using the attached test.hs. This takes 1.16 seconds, which is 2.5 times slower than using the sqlite3 command-line utility to perform the same inserts. That is much more overhead than I would have expected; persistent-sqlite is building some SQL statements, while sqlite is doing a rather more complex thing involving writing to disk.

% sh -c 'for x in $(seq 1 50000); do uuidgen; done > uuidlist'
% ghc --make test -O2
% time ./test < uuidlist
Migrating: CREATE TABLE "val"("id" INTEGER PRIMARY KEY,"foo" VARCHAR NOT NULL)
1.09user 0.03system 0:01.16elapsed 97%CPU (0avgtext+0avgdata 13804maxresident)k
8inputs+9024outputs (8major+1197minor)pagefaults 0swaps
% echo '.dump' | sqlite3 sqlitedb > sql
% time sqlite3 sqliteb2 < sql
0.41user 0.02system 0:00.46elapsed 94%CPU (0avgtext+0avgdata 7016maxresident)k
16inputs+4480outputs (0major+802minor)pagefaults 0swaps

In profiling this, I noticed that there are 1 gigabyte of allocations. Since the list of uuids is ~2 megabytes, that's a lot of churning. I suspect that is responsible for a large part of the 2 seconds extra runtime.

test.hs.txt

@parsonsmatt
Copy link
Collaborator

insert returns the Key a of whatever you're inserting, and for backends that don't support INSERT ... RETURNING, we do two separate calls: INSERT and then SELECT to get the key. This code taking a bit over twice as long isn't surprising as it is doing twice as many queries.

insert_ :: a -> SqlPersistT m () shouldn't have this issue - but the default implementation is void $ insert e, which has the same performance issue here.

Properly fixing this adds a field to SqlBackend for insert_, which only inserts and does not attempt to return anything.

Fortunately, we have two work-arounds:

  1. Use rawExecute to insert directly - this is slightly faster than the sqlite dump, somehow.
  2. Use insertMany to insert all the records at once - this takes 60% of the time of the sqlite dump.

Checking this out locally. The SQL dump has a bunch of INSERT INTO statements, and it doesn't batch into a single INSERT call.

Test code embedded instead of attached:

{-# LANGUAGE EmptyDataDecls             #-}
{-# LANGUAGE FlexibleContexts           #-}
{-# LANGUAGE GADTs                      #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses      #-}
{-# LANGUAGE OverloadedStrings          #-}
{-# LANGUAGE QuasiQuotes                #-}
{-# LANGUAGE TemplateHaskell            #-}
{-# LANGUAGE TypeFamilies               #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE BangPatterns #-}
import           Control.Monad.IO.Class  (liftIO)
import           Database.Persist
import           Database.Persist.Sqlite
import           Database.Persist.TH
import           Data.Text
import           Data.Text.IO
import           Control.Monad

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
Val
    foo Text
    deriving Show
|]

main :: IO ()
main = runSqlite "sqlitedb" $ do
    runMigration migrateAll
    forM_ [1..50000] $ \x -> do
    	f <- liftIO Data.Text.IO.getLine
    	_ <- insert $ Val f
	return ()

This is going to do 50,000 single-inserts, each of which is followed by a SELECT to read the key out of the database. So we're doing 100,000 queries here, instead of the 50,000 that the SQL code does. Twice as many queries ~ 2.5x slowdown isn't too surprising. I expect some overhead from the migrations, but relatively little - it's just a few queries, constant factor.

Locally, I'm seeing these results:

λ ~/Projects/sqlite-speed/ rm sqlitedb && stack exec -- time sqlite-speed-exe +RTS -s -RTS < uuidlist
Migrating: CREATE TABLE "val"("id" INTEGER PRIMARY KEY,"foo" VARCHAR NOT NULL)
     954,300,304 bytes allocated in the heap
         276,512 bytes copied during GC
          76,064 bytes maximum residency (2 sample(s))
          32,736 bytes maximum slop
               6 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       227 colls,     0 par    0.001s   0.001s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.001s   0.001s     0.0007s    0.0014s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.252s  (  0.269s elapsed)
  GC      time    0.002s  (  0.002s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.254s  (  0.271s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    3,793,145,331 bytes per MUT second

  Productivity  99.0% of total user, 99.1% of total elapsed

0.23user 0.01system 0:00.27elapsed 93%CPU (0avgtext+0avgdata 13200maxresident)k
8087inputs+8995outputs (829major+2128minor)pagefaults 0swaps

Runs are pretty consistently around 0.24s.

Doing the direct import here:

λ ~/Projects/sqlite-speed/ time sqlite3 sqliteb2 < sql
sqlite3 sqliteb2 < sql  0.11s user 0.00s system 94% cpu 0.117 total

So, about twice as slow - but we're doing twice as many queries. To test this, I'll use rawExecute, instead of insert - the code is equivalent, but we won't be doing the wasteful SELECT id FROM val WHERE _ROWID_=last_insert_rowid().

If I change the code to use rawExecute and have the same structure, then we get the same basic performance as the direct sqlite:

 main :: IO ()
 main = runSqlite "sqlitedb" $ do
     runMigration migrateAll
     forM_ [1..50000] $ \x -> do
         f <- liftIO Data.Text.IO.getLine
-        _ <- insert $ Val f
+        rawExecute "INSERT INTO val (foo) VALUES(?)" [PersistText f]
         return ()
λ ~/Projects/sqlite-speed/ rm sqlitedb && stack exec -- time sqlite-speed-exe +RTS -s -RTS < uuidlist
Migrating: CREATE TABLE "val"("id" INTEGER PRIMARY KEY,"foo" VARCHAR NOT NULL)
     190,298,360 bytes allocated in the heap
          64,416 bytes copied during GC
          74,032 bytes maximum residency (2 sample(s))
          32,736 bytes maximum slop
               6 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        43 colls,     0 par    0.000s   0.000s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.001s   0.001s     0.0006s    0.0011s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.090s  (  0.101s elapsed)
  GC      time    0.001s  (  0.001s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.091s  (  0.103s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    2,122,489,275 bytes per MUT second

  Productivity  98.2% of total user, 98.4% of total elapsed

0.07user 0.01system 0:00.10elapsed 87%CPU (0avgtext+0avgdata 12892maxresident)k
8087inputs+8995outputs (732major+2134minor)pagefaults 0swaps

Well, it's slightly faster, which is - weird and not what I would have expected! It's possible that GHC is really optimizing this

Even faster is a batch insert:

main :: IO ()
main = runSqlite "sqlitedb" $ do
    runMigration migrateAll
    uuids <- forM [1..50000] $ \x -> do
        f <- liftIO Data.Text.IO.getLine
        pure f
    insertMany_ $ Prelude.map Val uuids
λ ~/Projects/sqlite-speed/ rm sqlitedb && stack exec -- time sqlite-speed-exe +RTS -s -RTS < uuidlist
Migrating: CREATE TABLE "val"("id" INTEGER PRIMARY KEY,"foo" VARCHAR NOT NULL)
     111,574,176 bytes allocated in the heap
      25,129,528 bytes copied during GC
       7,102,760 bytes maximum residency (3 sample(s))
          77,528 bytes maximum slop
              18 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0        24 colls,     0 par    0.006s   0.006s     0.0003s    0.0009s
  Gen  1         3 colls,     0 par    0.006s   0.006s     0.0019s    0.0028s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    0.045s  (  0.057s elapsed)
  GC      time    0.012s  (  0.012s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    0.057s  (  0.069s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    2,477,288,618 bytes per MUT second

  Productivity  78.8% of total user, 82.3% of total elapsed

0.03user 0.02system 0:00.07elapsed 82%CPU (0avgtext+0avgdata 26020maxresident)k
8087inputs+9819outputs (826major+5293minor)pagefaults 0swaps

Much faster, though we've traded memory for this time gain. 18MB total memory in use, though fewer total allocations. bench even shows that it is 60% of the time of the sql import!

λ ~/Projects/sqlite-speed/ bench --after "rm sqliteb2" -- "sqlite3 sqliteb2 < sql" 
benchmarking sqlite3 sqliteb2 < sql
time                 105.2 ms   (104.7 ms .. 105.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 105.5 ms   (105.1 ms .. 106.4 ms)
std dev              945.2 μs   (336.3 μs .. 1.558 ms)

λ ~/Projects/sqlite-speed/ stack exec -- bench --before "rm sqlitedb" "sqlite-speed-exe < uuidlist"
benchmarking sqlite-speed-exe < uuidlist
time                 60.05 ms   (57.56 ms .. 61.72 ms)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 61.57 ms   (60.72 ms .. 63.18 ms)
std dev              2.026 ms   (1.082 ms .. 3.179 ms)

@joeyh
Copy link
Contributor Author

joeyh commented Dec 5, 2022

Thanks for looking into this @parsonsmatt.

In my situation, using insertMany_ would entail some significant complications, beyond memory use. Basically, the code is also doing deletes, and so order of operations matters, and batching inserts like that would need a Set or something to keep track of what's going on. Probably the extra bookkeeping for that would impact the speed as well.

rawExecute is looking appealing, but writing my own SQL feels wrong when using persistent.

So is there any impediment to adding a field to SqlBackend for insert_? Beyond it needing a version bump and affecting other backends. If the idea is acceptable I'd rather spend my time on that.

@parsonsmatt
Copy link
Collaborator

parsonsmatt commented Dec 5, 2022

Fortunately I think we can do this with a patch bump: insert_'s default implementation can be changed to just not run the second query and return immediately. #1449 does this and it halves execution time of the example script on my machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants