Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres connections are returned to Pool too quickly #1199

Closed
codygman opened this issue Mar 5, 2021 · 35 comments · Fixed by #1207
Closed

Postgres connections are returned to Pool too quickly #1199

codygman opened this issue Mar 5, 2021 · 35 comments · Fixed by #1207

Comments

@codygman
Copy link
Contributor

codygman commented Mar 5, 2021

When a query (especially long running) is interrupted while using a connection from a Pool, that connection is almost immediately returned to the pool even if the query is still running on that connection.

Another async consumer picks up the connection out of the pool to do it's work and you get:

libpq: failed (another command is already in progress

The gist of it is:

  • Create a connection pool of 1
  • In another thread acquire the connection from the pool doing some long running query like 10000 inserts
  • Kill that thread before it finishes
  • immediately try doing some other query using that same pool

I've created a reproducable repo here that just requires stack and docker (for the postgres server):

https://github.com/codygman/persistent-postgresql-query-in-progress-repro/blob/master/src/Main.hs

@codygman
Copy link
Contributor Author

codygman commented Mar 5, 2021

I originally thought this was haskellari/postgresql-simple#69 but after testing that doesn't seem to be the case.

Also related is #981 but the conclusions there seem to indicate you should be safe if you're using a pool.

@codygman
Copy link
Contributor Author

codygman commented Mar 5, 2021

It seems that RelException does come up after using Debug.trace in unsafeAcquireSqlConnFromPool in this commit:

relType: ReleaseNormal
relType: ReleaseNormal
killed thread
relType: ReleaseException
result: Left libpq: failed (another command is already in progress

@codygman
Copy link
Contributor Author

codygman commented Mar 5, 2021

And now for some real confusion... The issue still happens after calling Pool.destroyAllResources. I'm either really misunderstanding something or this is a pretty big issue.

This commit: https://github.com/codygman/persistent-postgresql-query-in-progress-repro/blob/958e66963c0ad9761149f7ef97fd5ec0d7322f73/src/Main.hs#L29

@codygman
Copy link
Contributor Author

codygman commented Mar 6, 2021

I'm wondering if this could be related to bos/pool#32.

@codygman
Copy link
Contributor Author

codygman commented Mar 6, 2021

I'm fairly confident this issue is something to do with persistent-postgresql since this equivalent code only using Data.Pool and Postgresql-simple doesn't have the same issue:

main :: IO ()
main = do

  -- I started a postgres server with:
  -- docker run --rm --name some-postgres -p 5432:5432 -e POSTGRES_PASSWORD=secret postgres

  pool <- Pool.createPool (PGS.connect PGS.defaultConnectInfo { PGS.connectPassword = "secret" } ) PGS.close 1 10 1
  
  threadId <- Concurrent.forkIO $ do
      Pool.withResource pool $ \conn -> do
        let numThings :: Int = 100000000
        putStrLn $ "start inserting " <> show numThings <> " things"
        Monad.forM_ [1 .. numThings] $ \_ ->  do
          PGS.execute_ conn "insert into foo values(1);"


  putStrLn "waiting for insert thread to make progress"
  Concurrent.threadDelay 5000000
  Monad.void $ Concurrent.killThread threadId
  putStrLn "killing insert thread"

  Pool.withResource pool $ \conn -> do
    PGS.execute_ conn "insert into foo values(1);"

  putStrLn "done"

https://github.com/codygman/persistent-postgresql-query-in-progress-repro/blob/3e84e282ccda34f4ca465a8605706254ca323234/src/Main.hs#L37

@codygman
Copy link
Contributor Author

codygman commented Mar 6, 2021

It looks like Persistent does the right thing here by catching any exceptions when destroying the pool but this is the only thing I could imagine being wrong that would exhibit this behavior:

createSqlPoolWithConfig mkConn config = do
    logFunc <- askLoggerIO
    -- Resource pool will swallow any exceptions from close. We want to log
    -- them instead.
    let loggedClose :: backend -> IO ()
        loggedClose backend = close' backend `UE.catchAny` \e -> runLoggingT
          (logError $ T.pack $ "Error closing database connection in pool: " ++ show e)
          logFunc
    liftIO $ createPool 
        (mkConn logFunc) 
        loggedClose 
        (connectionPoolConfigStripes config)
        (connectionPoolConfigIdleTimeout config)
        (connectionPoolConfigSize config)

-- Resource pool will swallow any exceptions from close. We want to log

Especially since Postgresql-simple doesn't seem affected.

@codygman
Copy link
Contributor Author

codygman commented Mar 6, 2021

~~Wow... I just verified that the use of UE.catchAny is causing this issue locally by removing it from the local copy of persistent I'm using with my test code. Now to figure out why in the world that could be. ~~

Edit: Nevermind... I left Pool.destroyAllResources from earlier.

However as I noted above Pool.destroyAllResources being called didn't prevent the error. Removing the UE.catchAny call did fix Pool.destroyAllResources working as expected.

@codygman
Copy link
Contributor Author

codygman commented Mar 9, 2021

This very much seems like it could be related:

snoyberg/conduit#425

Also today I tried layering on some of the extra exception handling that Persistent does to see if I could get to the error from my postgresql-simple base:

https://github.com/codygman/persistent-postgresql-query-in-progress-repro/blob/41ebcea31c2caf3e7e40d57f9385a80f5982f071/src/Main.hs#L33

I wasn't able too though... The above issue makes me wonder if adding in resourceT stuff will cause the issue to show up though.

@tfausak
Copy link
Contributor

tfausak commented Mar 10, 2021

I work with @codygman and am also looking into this issue. I was able to create a slightly smaller reproduction of the bug, which you can find here: https://gist.github.com/tfausak/ee019365d628a36fc3480d30d38bed0c

{-# language OverloadedStrings #-}

import Control.Concurrent (forkIO, killThread, threadDelay)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Logger (runStdoutLoggingT)
import Database.Persist.Postgresql (rawExecute, runSqlPool, withPostgresqlPool)

main :: IO ()
main = runStdoutLoggingT . withPostgresqlPool "" 1 $ \ pool -> liftIO $ do
  threadId <- forkIO
    . runStdoutLoggingT
    $ runSqlPool (rawExecute "select pg_sleep(1)" []) pool

  putStrLn "Waiting for the thread to start ..."
  threadDelay 500000

  putStrLn "Killing the thread ..."
  killThread threadId

  -- Wait around a little bit for the error to be output.
  threadDelay 500000
  putStrLn "Done."

I do not see the same behavior when using withPostgresqlConn, which suggests the bug is somewhere in the handling of connection pools.

Also if I switch from forkIO to async, the problem goes away.

@codygman
Copy link
Contributor Author

Also if I switch from forkIO to async, the problem goes away.

It actually even works if you call async within the forkIO which I found surprising:

https://gist.github.com/codygman/d38a049092301ade8e8e1bb362cac778

@codygman
Copy link
Contributor Author

codygman commented Mar 10, 2021

Perusing through issues in conduit (the resourcet repo is in there) I also found a comment from @snoyberg that might be helpful here:

The issue is that, if you use forkIO, the child thread isn't registered as using the internal state. Then, when the parent thread stops using the internal state, it will automatically clean up all of the resources still in the state, even though the child thread may be using them.

When you use resourceForkIO instead, it increments a counter to let the parent thread know that someone else is using the state. Only when the last thread finally stops using the step will the counter go down to 0 and the resources be cleaned up, when we know that no one is using them.

snoyberg/conduit#441 (comment)

This makes me wonder how much Persistent could do to prevent such an issue from happening though... or if consumers just always have to be aware of this issue.

@codygman
Copy link
Contributor Author

The more I think about this the more I think about masking. That led me to try:

runStdoutLoggingT . withPostgresqlPool "" 1 $ \ pool -> liftIO $ do
  threadId <- uninterruptibleMask_ $ forkIOWithUnmask $ \unmask ->
    unmask (void . runStdoutLoggingT $ runSqlPool (rawExecute "select pg_sleep(1)" []) pool)
      `finally` putStrLn "cleanup in child thread"

That didn't resolve the issue but it was inspired by revisiting the forkIOWithUnmask section of Asynchronous Exception Handling in Haskell that says:

This unmask action does not restore a previous masking state. Instead, it ensures that all masking is disabled. This is usually the behavior desired in the forked thread, since we want the forked thread to respond to async exceptions we send it, even if the parent thread is in a masked state.

@codygman
Copy link
Contributor Author

codygman commented Mar 10, 2021

uninterruptibleMask_ does seem to fix it, though the docs say to use this with care. Condensing everything down:

Working code with uninterruptibleMask:

{-# language OverloadedStrings #-}

import Control.Concurrent (forkIO, killThread, threadDelay)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Logger (runStdoutLoggingT)
import Database.Persist.Postgresql (rawExecute, runSqlPool, withPostgresqlPool)
import Control.Exception

main :: IO ()
main = uninterruptibleMask_ . runStdoutLoggingT . withPostgresqlPool "" 1 $ \ pool -> liftIO $ do
  threadId <- forkIO
    . runStdoutLoggingT
    $ runSqlPool (rawExecute "select pg_sleep(1)" []) pool

  putStrLn "Waiting for the thread to start ..."
  threadDelay 500000

  putStrLn "Killing the thread ..."
  killThread threadId

  -- Wait around a little bit for the error to be output.
  threadDelay 500000
  putStrLn "Done."

This uncovers an error with the Query that returns a result it wasn't expecting:

libpq-error: QueryError {qeMessage = "execute resulted in Col 1-column result", qeQuery = "select pg_sleep(1)"}

@codygman
Copy link
Contributor Author

A lot of related discussion here: fpco/safe-exceptions#3

@codygman
Copy link
Contributor Author

codygman commented Mar 11, 2021

I also want to note here that you'd likely want a timeout in the uninterruptibleMask above in any production code.

@codygman
Copy link
Contributor Author

codygman commented Mar 11, 2021

A recent blog by @joeyh mentions this issue and might have an explanation as to why mask doesn't work alone... an MVar could be empty:

Screenshot from 2021-03-10 19 28 27

http://joeyh.name/blog/entry/bracketing_and_async_exceptions_in_haskell/
https://old.reddit.com/r/haskell/comments/h0wv25/bracketing_and_async_exceptions_in_haskell/

Edit: Down towards the end he makes some recommendations and points out that you need to call uninterruptibleMask on async itself in case it throws an async exception 🤣 🤣 :

bracketInsulated :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
bracketInsulated a b = bracket
  (uninterruptibleMask $ \u -> async (u a) >>= u . wait)
  (\v -> uninterruptibleMask $ \u -> async (u (b v)) >>= u . wait)

(Note use of uninterruptibleMask here in case async itself does an interruptable operation. My first version got that wrong.. This is hard!)

@codygman
Copy link
Contributor Author

I'm too tired to know if it's possible or a good idea, but I this piece of the docs keeps catching my eye:

Sometimes it is too onerous to handle exceptions in the middle of a critical piece of stateful code. There are three ways to handle this kind of situation:

Use STM. Since a transaction is always either completely executed or not at all, transactions are a good way to maintain invariants over state in the presence of asynchronous (and indeed synchronous) exceptions.

Would it be possible or even desirable to use STM throughout the Persistent module to prevent this issue?

@codygman
Copy link
Contributor Author

It seems another fix that I believe will work without all other fixes is using uninterruptibleMask_ on the finalizeForeignPtr call in postgresql-libpq like so:

finish :: Connection
       -> IO ()
finish (Conn fp _) = do
    putStrLn "hi from libqp.finish"
    uninterruptibleMask_ $ finalizeForeignPtr fp

I'll verify if this by itself fixes things and then post asking about it in the related postgresql-libpq issue.

@parsonsmatt
Copy link
Collaborator

Yeah, I was wondering if we could switch to safe-exceptions in the library, but persistent doesn't seem to do any internal exception handling at all. Everything is deferred to underlying libraries.

@codygman
Copy link
Contributor Author

Yeah, I was wondering if we could switch to safe-exceptions in the library, but persistent doesn't seem to do any internal exception handling at all. Everything is deferred to underlying libraries.

I think either could work, but if you defer to the underlying libraries you are at the whim of whether they do things correctly asynchronously. Maybe this is the only way it can be... IIRC that was the point being made in:

https://www.fpcomplete.com/blog/2018/04/async-exception-handling-haskell/

I still feel a little lost with all of this, but I'm pretty sure something is wrong somewhere 😄

@codygman
Copy link
Contributor Author

PG.close succeeds on a connection with a running query.... but should it? If the connection still being busy triggered a ReleaseException here the Data.Pool machinery would all work out I think.

@codygman
Copy link
Contributor Author

Perhaps this is just a very long way for me to figure out I'm affected by #981?

. This means that all queries with the same text will map to a single Statement that is used for stepping and preparing. This means if any threads ever run the same query they will interfere with each other's preparation and stepping and generally just cause arbitrary failures due to racing eachother.

I thought that by having a Pool.Pool Persist.SqlBackend I wouldn't be affected by that issue though.

@codygman
Copy link
Contributor Author

This bug in pool makes it kind of difficult to isolate the problem: bos/pool#32

destroyResource definitely gets called more than once and the best way to ensure the tainted connection is actually gone is to only allow the Pool to have 1 stripe and 1 connection.

@codygman
Copy link
Contributor Author

codygman commented Mar 15, 2021

I thought that by cancelling pending queries in connRollback because it's called on ReleaseException I could successfully prevent the bad postgres connection that still has a running query from re-entering the pool but it seems not.

I verified with print debugging that it does actually get destroyed... or at least Pool.destroyResource is called and no error reported to me giving this output:

killed thread
release exception
1. pool exhausted
start destroy resource
finish destroy resource
2. pool could be borrowed

That makes me wonder if either:

  1. there is something wrong with Data.Pool and it's not actually destroying this
  2. Pool.takeResource never actually gets called and only the old reference to (Pool backend) is ever used for some reason

I can't think of anything else.

I'm afraid my team might have to move away from Persistent due to this issue :sad:

@parsonsmatt
Copy link
Collaborator

oh, huh

-- | Creates a pool of connections to a SQL database.
--
-- @since 2.11.0.0
createSqlPoolWithConfig
    :: forall m backend. (MonadLoggerIO m, MonadUnliftIO m, BackendCompatible SqlBackend backend)
    => (LogFunc -> IO backend) -- ^ Function to create a new connection
    -> ConnectionPoolConfig
    -> m (Pool backend)
createSqlPoolWithConfig mkConn config = do
    logFunc <- askLoggerIO
    -- Resource pool will swallow any exceptions from close. We want to log
    -- them instead.
    let loggedClose :: backend -> IO ()
        loggedClose backend = close' backend `UE.catchAny` \e -> runLoggingT
          (logError $ T.pack $ "Error closing database connection in pool: " ++ show e)
          logFunc
    liftIO $ createPool 
        (mkConn logFunc) 
        loggedClose 
        (connectionPoolConfigStripes config)
        (connectionPoolConfigIdleTimeout config)
        (connectionPoolConfigSize config)

we do catch stuff! Right here, we do UE.catchAny and swallow the exception if one is thrown in close' backend. I missed this one in my earlier search. This is using UnliftIO.Exception, which is itself defined as a specialization of catch:

catch
  :: (MonadUnliftIO m, Exception e)
  => m a -- ^ action
  -> (e -> m a) -- ^ handler
  -> m a
catch f g = withRunInIO $ \run -> run f `EUnsafe.catch` \e ->
  if isSyncException e
    then run (g e)
    -- intentionally rethrowing an async exception synchronously,
    -- since we want to preserve async behavior
    else EUnsafe.throwIO e

🤔

The catchAny behavior is also present in 2.10.5.3, which predates the since annotation from the other one. So it isn't likely to be easily fixed by checking an earlier version.

@parsonsmatt
Copy link
Collaborator

OK, got it running locally, and I'm reproducing it. As is usually the case with concurrent behavior, compiler and runtime flags matter!

Building with no GHC options

λ ~/Projects/persistent/ master* stack exec -- conn-kill   
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)

This happens reliably - every single time we get a Left libpq: failed.

Building with -threaded

This gets us different behavior, right off the bat! Fortunately, it's not reliable.

λ ~/Projects/persistent/ master* stack exec -- conn-kill
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)
λ ~/Projects/persistent/ master* stack exec -- conn-kill
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
conn-kill: libpq: failed (another command is already in progress
)
result: Left Unexpected null

Running with a single thread (eg +RTS -N1) doesn't appear to affect behavior. I still see the error occur every once in a while. Howver, with +RTS -N4, it's another story - I've run it several times and I never get the error!

λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N4
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N4
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N4
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N4
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N4
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N4
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N4
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left Unexpected null

(At this point, I fixed it to be Persist.Single (Maybe String), since the pg_sleep(5) call was returning NULL.)

I tried running the executable with -N1 again to see what would happen.

λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N1
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
conn-kill: libpq: failed (another command is already in progress
)
result: Right [Single {unSingle = Nothing}]
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N1
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N1
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)
λ ~/Projects/persistent/ master* stack exec -- conn-kill +RTS -N1
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)

Consistently failed with Left libpq - makes me think that it was an imprecise exception at play with the earlier successes? Or the earlier Left: unexpected Null. Let's try it again without -threaded.

λ ~/Projects/persistent/ master* stack exec -- conn-kill         
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)
λ ~/Projects/persistent/ master* stack exec -- conn-kill
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)
λ ~/Projects/persistent/ master* stack exec -- conn-kill
NOTICE:  relation "foo" already exists, skipping
start inserting 100000000 things
killed thread
result: Left libpq: failed (another command is already in progress
)

OK, so it was an imprecise exception! Neat. Lol.


I feel awfully stupid about saying we don't catch any exceptions earlier. We import Control.Exception (bracket, mask, onException) in Database.Perist.Sql.Run. bracketis used in public-facing functions. Switching toUE.bracket` doesn't affect anything. as far as I can tell.

@parsonsmatt
Copy link
Collaborator

Throwing an uninterruptibleMask_ on the runSqlPool thing fixes it, in the sense that no bad connections are returned. So something Weird is happening with async exceptions here.

@parsonsmatt
Copy link
Collaborator

Can more easily make it reliable/deterministic by having the loop occur inside a SQL transaction - just put the Monad.forM_ inside the runSqlPool instead of around it. Output is pretty consistently failing now:

Thing #: 63666
Thing #: 63667
killed thread
got a release exception
close' called
statements finalized boss
connection closed, boss
conn-kill: libpq: failed (another command is already in progress
)
creating conn
result: Right [Single {unSingle = Nothing}]

I wish the Acquire interace allowed us to get our hands on the caught exception. Gonna file an issue there.

@codygman
Copy link
Contributor Author

Switching toUE.bracket` doesn't affect anything. as far as I can tell.

That's also what I saw. I also tried using bos/pool#37. Then again I'm not sure if I ever removed UE.catchAny and used that fork.

@codygman
Copy link
Contributor Author

codygman commented Mar 16, 2021

@merijn Suggested the RTS could be sending a signal in irc and said that could be tested by compiling with -debug and running inside of GDB. I don't see any "interruptible" in postgresql-simple, but this comment comes to mind:

As detailed in Interruptible operations, beginMode, rollback_ and commit all perform network IO and as such can be interrupted even when inside mask.

As a result, this function does not guarantee that a rollback was issued. So if you continue using the conn somewhere else after an async exception was thrown, you will unknowingly still be in an open transaction. (Though not really fully unknowingly because the exception will have bubbled up to you, since as I mentioned, mask doesn't really prevent it. So you'll have handled it somewhere, e.g. in some catch-all exception handler of a web server, and if you put the connection for which rollback or commit were interrupted back into some pool, then the next user of that exception will be unknowingly be in the middle of a transaction.)

@codygman
Copy link
Contributor Author

Also related:

see withPG and withTransaction in snaplet-postgresql-simple; otherwise every database action in a single thread, can potentially go out on a different connection, and single connections can bounce around between threads.

lpsmith/postgresql-simple#202 (comment)

withPg from snaplet-postgres-simple: https://github.com/mightybyte/snaplet-postgresql-simple/blob/98739b6cdd8a4f97506b80de2ce102d8501975b0/src/Snap/Snaplet/PostgresqlSimple/Internal.hs#L127

-- | Function that reserves a single connection for the duration of the given
-- action. Nested calls to withPG will only reserve one connection. For example,
-- the following code calls withPG twice in a nested way yet only results in a single
-- connection being reserved:
--
-- > myHandler = withPG $ do
-- >    queryTheDatabase
-- >    commonDatabaseMethod
-- >
-- > commonDatabaseMethod = withPG $ do
-- >    moreDatabaseActions
-- >    evenMoreDatabaseActions
--
-- This is useful in a practical setting because you may often find yourself in a situation
-- where you have common code (that requires a database connection) that you wish to call from
-- other blocks of code that may require a database connection and you still want to make sure
-- that you are only using one connection through all of your nested methods.
withPG :: (HasPostgres m)
       => m b -> m b
withPG f = do
    s <- getPostgresState
    case s of
      (PostgresPool p) -> withResource p (\c -> setLocalPostgresState (PostgresConn c) f)
      (PostgresConn _) -> f

Maybe I'm missing something, but this doesn't seem a lot different from what persistent does.

@codygman
Copy link
Contributor Author

Can more easily make it reliable/deterministic by having the loop occur inside a SQL transaction - just put the Monad.forM_ inside the runSqlPool instead of around it.

Won't that happen on the same connection though? I'd expect an error on the same connection. Keeping Monad.forM_ on the outside was an attempt hoping that Data.Pool would block until the connection was in a valid state again or a new one was created.

@parsonsmatt
Copy link
Collaborator

My understanding is that the problem happens roughly like this:

  1. Thread A grabs a SqlBackend from the Pool SqlBackend and starts running a transaction
  2. Thread B sends a killThread threadA, which is an async exception.
  3. something happens and another command is run on libpq (which command?) on the connection, even though the current command is in progress.

I consistently see that a new SqlBackend is created when I go to run the pg_sleep command in the test. I need to go to the store and get some more putStrLns 😂

@parsonsmatt parsonsmatt mentioned this issue Mar 16, 2021
5 tasks
@codygman
Copy link
Contributor Author

codygman commented Mar 16, 2021

My understanding is that the problem happens roughly like this:

1. Thread A grabs a `SqlBackend` from the `Pool SqlBackend` and starts running a transaction

2. Thread B sends a `killThread threadA`, which is an async exception.

3. _something_ happens and another command is run on `libpq` (which command?) on the connection, even though the current command is in progress.

Does PG.close/LibPQ.finish fail with 'query in progress' if the connection has a query in progress I wonder?

@codygman
Copy link
Contributor Author

@parsonsmatt FYI just 30 minutes ago someone posted with a patch for postgresql-simple which could tie into all of this:

haskellari/postgresql-simple#69 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants