Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async exception issue #1207

Merged
merged 13 commits into from
Mar 17, 2021
Merged

async exception issue #1207

merged 13 commits into from
Mar 17, 2021

Conversation

parsonsmatt
Copy link
Collaborator

@parsonsmatt parsonsmatt commented Mar 16, 2021

Before submitting your PR, check that you've:

After submitting your PR:

  • Update the Changelog.md file with a link to your PR
  • Bumped the version number if there isn't an (unreleased) on the Changelog
  • Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

test initially thanks to @codygman

investigating #1199
maybe even fixes #1199

@parsonsmatt
Copy link
Collaborator Author

As of 9d07728, I've investigated and found something interesting:

rawAcquireSqlConn
    :: forall backend m
     . (MonadReader backend m, BackendCompatible SqlBackend backend)
    => Maybe IsolationLevel -> m (Acquire backend)
rawAcquireSqlConn isolation = do
    conn <- MonadReader.ask
    let rawConn :: SqlBackend
        rawConn = projectBackend conn

        getter :: T.Text -> IO Statement
        getter = getStmtConn rawConn

        beginTransaction :: IO backend
        beginTransaction = conn <$ connBegin rawConn getter isolation

        finishTransaction :: backend -> ReleaseType -> IO ()
        finishTransaction _ relType = case relType of
            ReleaseExceptionWith e -> do
                putStrLn $ "got a release exception: " <> show e
                connRollback rawConn getter
                putStrLn "rolled back transaction"
            _ -> connCommit rawConn getter

    return $ mkAcquireType beginTransaction finishTransaction

Note the two putStrLn calls in finishTransaction. Now, look at this ioutput:

λ ~/Projects/persistent/ master* stack exec -- conn-kill
[Info] [main thread] creating table...
creating conn
NOTICE:  relation "foo" already exists, skipping
[Info] [main thread] [ThreadId 3] start inserting 100000000 things
[Info] [main thread] [ThreadId 3] inside of thing
[Info] [main thread] [ThreadId 3] Thing #: 1000
[Info] [main thread] [ThreadId 3] Thing #: 2000
[Info] [main thread] [ThreadId 3] Thing #: 3000
[Info] [main thread] [ThreadId 3] Thing #: 4000
[Info] [main thread] [ThreadId 3] Thing #: 5000
[Info] [main thread] [ThreadId 3] Thing #: 6000
[Info] [main thread] [ThreadId 3] Thing #: 7000
[Info] [main thread] [ThreadId 3] Thing #: 8000
[Info] [main thread] [ThreadId 3] Thing #: 9000
[Info] [main thread] [ThreadId 3] Thing #: 10000
[Info] [main thread] [ThreadId 3] Thing #: 11000
[Info] [main thread] [ThreadId 3] Thing #: 12000
[Info] [main thread] [ThreadId 3] Thing #: 13000
[Info] [main thread] [ThreadId 3] Thing #: 14000
[Info] [main thread] killed thread
[Info] [main thread] destroying resources
[Info] [main thread] pg_sleep
got a release exception: thread killed
close' called
statements finalized boss
connection closed, boss
conn-kill: libpq: failed (another command is already in progress
)
creating conn
[Info] [main thread] result: Right [Single {unSingle = Nothing}]

We get connection closed, boss and then the exception - libpq: failed (another command is already in progress). We don't ever emit rolled back transaction. This means the exception is occuring when we try to roll back the transaction..

This is running with the non-threaded runtime. Compiling with -threaded, I get the same behavior. Consistently, we get connection closed, boss and then an exception. Now... Let's run with some threads. With two threads, no change.

``` λ ~/Projects/persistent/ matt/test-async-exception-issue* stack exec -- conn-kill +RTS -N2 [Info] [main thread] creating table... creating conn NOTICE: relation "foo" already exists, skipping [Info] [main thread] [ThreadId 7] start inserting 100000000 things [Info] [main thread] [ThreadId 7] inside of thing [Info] [main thread] [ThreadId 7] Thing #: 1000 [Info] [main thread] [ThreadId 7] Thing #: 2000 [Info] [main thread] [ThreadId 7] Thing #: 3000 [Info] [main thread] [ThreadId 7] Thing #: 4000 [Info] [main thread] [ThreadId 7] Thing #: 5000 [Info] [main thread] [ThreadId 7] Thing #: 6000 [Info] [main thread] [ThreadId 7] Thing #: 7000 [Info] [main thread] [ThreadId 7] Thing #: 8000 [Info] [main thread] [ThreadId 7] Thing #: 9000 [Info] [main thread] [ThreadId 7] Thing #: 10000 [Info] [main thread] [ThreadId 7] Thing #: 11000 [Info] [main thread] killed thread [Info] [main thread] destroying resources [Info] [main thread] pg_sleep got a release exception: thread killed close' called statements finalized boss connection closed, boss conn-kill: libpq: failed (another command is already in progress ) creating conn [Info] [main thread] result: Right [Single {unSingle = Nothing}] λ ~/Projects/persistent/ matt/test-async-exception-issue* stack exec -- conn-kill +RTS -N2 [Info] [main thread] creating table... creating conn NOTICE: relation "foo" already exists, skipping [Info] [main thread] [ThreadId 7] start inserting 100000000 things [Info] [main thread] [ThreadId 7] inside of thing [Info] [main thread] [ThreadId 7] Thing #: 1000 [Info] [main thread] [ThreadId 7] Thing #: 2000 [Info] [main thread] [ThreadId 7] Thing #: 3000 [Info] [main thread] [ThreadId 7] Thing #: 4000 [Info] [main thread] [ThreadId 7] Thing #: 5000 [Info] [main thread] [ThreadId 7] Thing #: 6000 [Info] [main thread] [ThreadId 7] Thing #: 7000 [Info] [main thread] [ThreadId 7] Thing #: 8000 [Info] [main thread] [ThreadId 7] Thing #: 9000 [Info] [main thread] [ThreadId 7] Thing #: 10000 [Info] [main thread] [ThreadId 7] Thing #: 11000 [Info] [main thread] [ThreadId 7] Thing #: 12000 [Info] [main thread] killed thread [Info] [main thread] destroying resources [Info] [main thread] pg_sleep got a release exception: thread killed close' called statements finalized boss connection closed, boss creating conn conn-kill: libpq: failed (another command is already in progress ) [Info] [main thread] result: Right [Single {unSingle = Nothing}] λ ~/Projects/persistent/ matt/test-async-exception-issue* stack exec -- conn-kill +RTS -N2 [Info] [main thread] creating table... creating conn NOTICE: relation "foo" already exists, skipping [Info] [main thread] [ThreadId 7] start inserting 100000000 things [Info] [main thread] [ThreadId 7] inside of thing [Info] [main thread] [ThreadId 7] Thing #: 1000 [Info] [main thread] [ThreadId 7] Thing #: 2000 [Info] [main thread] [ThreadId 7] Thing #: 3000 [Info] [main thread] [ThreadId 7] Thing #: 4000 [Info] [main thread] [ThreadId 7] Thing #: 5000 [Info] [main thread] [ThreadId 7] Thing #: 6000 [Info] [main thread] [ThreadId 7] Thing #: 7000 [Info] [main thread] [ThreadId 7] Thing #: 8000 [Info] [main thread] [ThreadId 7] Thing #: 9000 [Info] [main thread] [ThreadId 7] Thing #: 10000 [Info] [main thread] [ThreadId 7] Thing #: 11000 [Info] [main thread] killed thread [Info] [main thread] destroying resources [Info] [main thread] pg_sleep got a release exception: thread killed close' called statements finalized boss connection closed, boss creating conn conn-kill: libpq: failed (another command is already in progress ) [Info] [main thread] result: Right [Single {unSingle = Nothing}] ```

With four threads?

``` λ ~/Projects/persistent/ matt/test-async-exception-issue* stack exec -- conn-kill +RTS -N4 [Info] [main thread] creating table... creating conn NOTICE: relation "foo" already exists, skipping [Info] [main thread] [ThreadId 9] start inserting 100000000 things [Info] [main thread] [ThreadId 9] inside of thing [Info] [main thread] [ThreadId 9] Thing #: 1000 [Info] [main thread] [ThreadId 9] Thing #: 2000 [Info] [main thread] [ThreadId 9] Thing #: 3000 [Info] [main thread] [ThreadId 9] Thing #: 4000 [Info] [main thread] [ThreadId 9] Thing #: 5000 [Info] [main thread] [ThreadId 9] Thing #: 6000 [Info] [main thread] [ThreadId 9] Thing #: 7000 [Info] [main thread] [ThreadId 9] Thing #: 8000 [Info] [main thread] [ThreadId 9] Thing #: 9000 [Info] [main thread] [ThreadId 9] Thing #: 10000 [Info] [main thread] [ThreadId 9] Thing #: 11000 [Info] [main thread] killed thread [Info] [main thread] destroying resources [Info] [main thread] pg_sleep got a release exception: thread killed close' called statements finalized boss connection closed, boss conn-kill: creating conn libpq: failed (another command is already in progress ) [Info] [main thread] result: Right [Single {unSingle = Nothing}] λ ~/Projects/persistent/ matt/test-async-exception-issue* stack exec -- conn-kill +RTS -N4 [Info] [main thread] creating table... creating conn NOTICE: relation "foo" already exists, skipping [Info] [main thread] [ThreadId 9] start inserting 100000000 things [Info] [main thread] [ThreadId 9] inside of thing [Info] [main thread] [ThreadId 9] Thing #: 1000 [Info] [main thread] [ThreadId 9] Thing #: 2000 [Info] [main thread] [ThreadId 9] Thing #: 3000 [Info] [main thread] [ThreadId 9] Thing #: 4000 [Info] [main thread] [ThreadId 9] Thing #: 5000 [Info] [main thread] [ThreadId 9] Thing #: 6000 [Info] [main thread] [ThreadId 9] Thing #: 7000 [Info] [main thread] [ThreadId 9] Thing #: 8000 [Info] [main thread] [ThreadId 9] Thing #: 9000 [Info] [main thread] [ThreadId 9] Thing #: 10000 [Info] [main thread] [ThreadId 9] Thing #: 11000 [Info] [main thread] killed thread [Info] [main thread] destroying resources [Info] [main thread] pg_sleep got a release exception: thread killed close' called statements finalized boss connection closed, boss creating conn conn-kill: libpq: failed (another command is already in progress ) [Info] [main thread] result: Right [Single {unSingle = Nothing}] λ ~/Projects/persistent/ matt/test-async-exception-issue* stack exec -- conn-kill +RTS -N4 [Info] [main thread] creating table... creating conn NOTICE: relation "foo" already exists, skipping [Info] [main thread] [ThreadId 9] start inserting 100000000 things [Info] [main thread] [ThreadId 9] inside of thing [Info] [main thread] [ThreadId 9] Thing #: 1000 [Info] [main thread] [ThreadId 9] Thing #: 2000 [Info] [main thread] [ThreadId 9] Thing #: 3000 [Info] [main thread] [ThreadId 9] Thing #: 4000 [Info] [main thread] [ThreadId 9] Thing #: 5000 [Info] [main thread] [ThreadId 9] Thing #: 6000 [Info] [main thread] [ThreadId 9] Thing #: 7000 [Info] [main thread] [ThreadId 9] Thing #: 8000 [Info] [main thread] [ThreadId 9] Thing #: 9000 [Info] [main thread] [ThreadId 9] Thing #: 10000 [Info] [main thread] [ThreadId 9] Thing #: 11000 got a release exception: thread killed [Info] [main thread] killed thread [Info] [main thread] destroying resources [Info] [main thread] pg_sleep rolled back transaction close' called statements finalized boss connection closed, boss creating conn [Info] [main thread] result: Right [Single {unSingle = Nothing}] λ ~/Projects/persistent/ matt/test-async-exception-issue* stack exec -- conn-kill +RTS -N4 [Info] [main thread] creating table... creating conn NOTICE: relation "foo" already exists, skipping [Info] [main thread] [ThreadId 9] start inserting 100000000 things [Info] [main thread] [ThreadId 9] inside of thing [Info] [main thread] [ThreadId 9] Thing #: 1000 [Info] [main thread] [ThreadId 9] Thing #: 2000 [Info] [main thread] [ThreadId 9] Thing #: 3000 [Info] [main thread] [ThreadId 9] Thing #: 4000 [Info] [main thread] [ThreadId 9] Thing #: 5000 [Info] [main thread] [ThreadId 9] Thing #: 6000 [Info] [main thread] [ThreadId 9] Thing #: 7000 [Info] [main thread] [ThreadId 9] Thing #: 8000 [Info] [main thread] [ThreadId 9] Thing #: 9000 [Info] [main thread] [ThreadId 9] Thing #: 10000 [Info] [main thread] [ThreadId 9] Thing #: 11000 [Info] [main thread] killed thread [Info] [main thread] destroying resources [Info] [main thread] pg_sleep got a release exception: thread killed close' called statements finalized boss connection closed, boss creating conn conn-kill: libpq: failed (another command is already in progress ) [Info] [main thread] result: Right [Single {unSingle = Nothing}] ```

It's inconsistent - I've seen two runs where it didn't happen, and a few more where it did.

🤔

On a "successful" run, we have this log of events:

got a release exception: thread killed
[Info] [main thread] killed thread
[Info] [main thread] destroying resources
[Info] [main thread] pg_sleep
rolled back transaction
close' called
statements finalized boss
connection closed, boss
creating conn
[Info] [main thread] result: Right [Single {unSingle = Nothing}]

On a failed run, it looks like this:

[Info] [main thread] [ThreadId 9] Thing #: 11000
[Info] [main thread] killed thread
[Info] [main thread] destroying resources
[Info] [main thread] pg_sleep
got a release exception: thread killed
close' called
statements finalized boss
connection closed, boss
conn-kill: libpq: failed (another command is already in progress
)
creating conn
[Info] [main thread] result: Right [Single {unSingle = Nothing}]

So the exact sequence appears to matter. I'm noticing that we call destroying resources before the close' is called on the bad thread. The good thread has got a release exception running before the destroy resources is called.

In e03e81693d555f3606c0f73ed61d2f87f087ccee I remove the call. The behavior is unchanged. So that's not it.

(Note to self: investigate resourcet-pool as a potential problem here)

@parsonsmatt parsonsmatt marked this pull request as draft March 16, 2021 16:58
@parsonsmatt
Copy link
Collaborator Author

So here's resourcet-pool, which was pulled out from persistent. I think it's buggy.

-- | Convert a 'Pool' into an 'Acquire'.
poolToAcquire :: Pool a -> Acquire a
poolToAcquire pool = fst <$> mkAcquireType getResource freeResource
  where
    getResource = takeResource pool
    freeResource (resource, localPool) = \case
      ReleaseException -> destroyResource pool localPool resource
      _ -> putResource localPool resource

The docs on takeResource say:

Note that this function should be used with caution, as improper exception handling can lead to leaked resources

Which, uh, does sound like a problem!

@parsonsmatt
Copy link
Collaborator Author

parsonsmatt commented Mar 16, 2021

And, yeah, avoiding it solves the problem.

runSqlPool
    :: forall backend m a. (MonadUnliftIO m, BackendCompatible SqlBackend backend)
    => ReaderT backend m a -> Pool backend -> m a
runSqlPool r pconn =
    -- with (acquireSqlConnFromPool pconn) $ runReaderT r
    withRunInIO $ \runInIO ->
        withResource pconn $ \backend ->
            runInIO $ runReaderT r backend

With this, I don't ever see the problem happening.

However, this also doesn't attempt to rollback the transaction. So it's not the same, really. I added transaction support in 7d67dac and it worked. Hm.

@parsonsmatt
Copy link
Collaborator Author

Let's compare Data.Acquire.with and Data.Pool.withResource:

with :: MonadUnliftIO m
     => Acquire a
     -> (a -> m b)
     -> m b
with (Acquire f) g = withRunInIO $ \run -> E.mask $ \restore -> do
    Allocated x free <- f restore
    res <- restore (run (g x)) `E.onException` free ReleaseException
    free ReleaseNormal
    return res

withResource ::
    (MonadBaseControl IO m)
  => Pool a -> (a -> m b) -> m b
withResource pool act = control $ \runInIO -> mask $ \restore -> do
  (resource, local) <- takeResource pool
  ret <- restore (runInIO (act resource)) `onException`
            destroyResource pool local resource
  putResource local resource
  return ret

This is damn near identitcal. So what's wrong?

@parsonsmatt
Copy link
Collaborator Author

@codygman want to test this out on your codebase? i'm going to need to backport it probably too

runLoggingT
(logError $ T.pack $ "Error closing database connection in pool: " ++ show e)
logFunc
UE.throwIO e
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a somewhat semantic change here. Instead of swallowing the exception, we allow resource-pool to handle it somehow.

@@ -294,7 +245,7 @@ withSqlConn
=> (LogFunc -> IO backend) -> (backend -> m a) -> m a
withSqlConn open f = do
logFunc <- askLoggerIO
withRunInIO $ \run -> bracket
withRunInIO $ \run -> UE.bracket
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly semantic change - bringing it in line with UnliftIO.Exception semantics should make things easier in the future.

:: forall backend m
. (MonadReader (Pool backend) m, BackendCompatible SqlBackend backend)
=> m (Acquire backend)
unsafeAcquireSqlConnFromPool = MonadReader.asks poolToAcquire
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... These things are definitely part of the public API. So I can't just remove them without pointing users to a better thing! Damn.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'll deprecate them when I backport this fix.

-- - initialize, afterwards, and onException.
--
-- @since 2.12.0.0
runSqlPoolWithHooks
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shiny new functionality! should be equivalent in terms of power.

@merijn I think you wrote the original Acquire-based stuff, do you mind trying this out? Specifically the runSqlPoolNoTransaction stuff should be what you need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is an extremely late reaction, but I finally got around to (trying to) bump my code to GHC 9.2 and 9.4.

runSqlPoolNoTransaction is fundamentally incompatible in doing what I was using unsafeAcquire for, as it incurs a MonadUnliftIO constraint. The Acquire interface (like the surviving acquireSqlConn) only requires MonadResource. There is no MonadUnliftIO instance for ConduitT, which means that runSqlPoolNoTransaction is unusable in large parts of my code.

@parsonsmatt parsonsmatt marked this pull request as ready for review March 17, 2021 00:02
parsonsmatt added a commit that referenced this pull request Mar 17, 2021
@parsonsmatt
Copy link
Collaborator Author

@codygman has told me on Slack that this fixed his problem! 😂 🎉

Going to merge these PRs once CI passes here, and then release the backported fixes.

@parsonsmatt parsonsmatt merged commit a5867d4 into master Mar 17, 2021
parsonsmatt added a commit that referenced this pull request Mar 17, 2021
* Backport #1207 to persistent-2.10.5

* bump persistent version

* sigh

* fix ci

* restore masking
@merijn
Copy link
Contributor

merijn commented Mar 18, 2021

I know what's going on! The problem isn't acquire API I added in 2.10.5. The problem is that the code mixes forkIO and ResourceT, while relying on MonadUnliftIO to run code in the forked thread. Which you can't do.

I can't believe I didn't notice that before in @codygman's code. I opened an issue about this on resourcet ages ago, but solving the problem was deemed to costly to do. So, yes, any MonadResource usage that mixes with forkIO is likely broken. See: snoyberg/conduit#425

@parsonsmatt
Copy link
Collaborator Author

I'm not sure that ResourceT is at play here, though. There's no ResourceT in the relevant problem code.

@parsonsmatt parsonsmatt deleted the matt/test-async-exception-issue branch April 1, 2021 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Postgres connections are returned to Pool too quickly
2 participants