Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests crashing with Assertion failed: PostgresConnection deinitialized before being closed. #2810

Open
finestructure opened this issue Apr 20, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@finestructure
Copy link
Sponsor Member

We've started tracking an issue in the Swift Package Index about test flakiness around four weeks ago. When running tests under certain conditions, we're hitting the assert

Assertion failed: PostgresConnection deinitialized before being closed.

in PostgresNIO/PostgresConnection.swift:153. This seems to be happening when the Application is shutdown after a test, which we do as follows in each test:

    func test() async throws {
        let app = Application(.testing)
        try configure(app)
        defer { app.shutdown() }
        ...

It's important to note that we originally saw this issue triggering the assert

Application.shutdown() was not called before Application deinitialized.

in Vapor/Application.swift:164. This changed to the PostgresConnection assert over the course of debugging this issue but I suspect both of these have the same underlying cause. Since I can't be sure this is the case, I wanted to raise this issue at the API level where we first observed it and where more people would see it.

I've prepared a standalone repo "DebugTestRace" which is a minimally modified template Vapor project exposing the issue.

The file AppTests.swift contains the reproducer with an enum Variants controlling which core test to run.

This is relevant, because there seem to be a number of conditions that need to be met in order to trigger the issue and I wanted to make it easy to observe them side by side with some passing variants. The conditions are:

  • At least two EventLoopFutures need to be in flight.
  • At least one of them needs to throw a .notFound exception via unwrap (or perhaps any other exception needs to be thrown via some mechanism).
  • The EventLoopFuture is resolved in an async/await context via try await elf.get().
  • The test must have a certain timing behaviour. In particular, adding a 100 millisecond usleep before the app.shutdown avoids triggering the assert regardless of any or all of the above conditions being met.

The way to expose the issue is to run all the tests in AppTests which is just running the same test function _test over and over 50 times. This should trigger the assertion for the two variants triggerAssertWithAnotherELF and triggerAssertWithELFAnd.

Make sure to run make db-up and make migrate to launch and set up the dockerised test db.

For what it's worth, there's some evidence that the speed of the machine executing the tests is relevant. I can typically trigger the assertion via the DebugTestRace project on my M1 Max MacBook Pro within a single run of all tests for the two failing variants. Only rarely do all tests for these pass.

We're currently working around these crashes by inserting strategic usleep calls but we'll probably be able to work around it by converting the underlying functions to async.

Any help tracking this down is greatly appreciated, in particular if there's anything wrong with our setup (which is of course entirely possible!).

@finestructure finestructure added the bug Something isn't working label Apr 20, 2022
@arroz
Copy link

arroz commented May 14, 2023

I'm seeing this as well. The conditions in my case are simpler, though: during a test, I was creating a very large number of connections intentionally, by running a bunch of transactions on tasks created inside a withThrowingTaskGroup. Something like this:

let app = Application(.testing)
defer { app.shutdown() }
try await configure(app)

await withThrowingTaskGroup(of: Void.self, body: { group in
  for _ in 0..<500 {
    group.addTask {
      try await app.db.transaction {  }
    }
  }
}

The relevant part is I forgot to raise the connection limit on the PSQL configuration file, so when I reached the limit, PSQL started rejecting new connections. It seems every time Fluent fails to open a database connection, it will eventually trigger this error when the application shuts down.

I'm not familiar with swift-NIO (nor Vapor, started using it just 3 days ago) so the stack trace doesn't mean anything to me at this point. But I believe this can be easily reproducible by running a similar test above (make sure the max_connections setting in postgresql.conf is set to something way lower than 500).

See next comment.

@arroz
Copy link

arroz commented May 14, 2023

Nevermind, by logging things at trace level, I realized it's my mistake. 🤦‍♂️

So I was removing something from the database as the last step of the test. However, I did it on a defer block, like this:

defer { let _ = x.delete(force: true, on: app.db) }

What I didn't realize is this is not the async call, but the one that returns the EventLoopFuture. So the race condition was actually in my code, as that statement was still executing during the application shutdown.

Replacing that by a proper async call fixed the issue. Apologies for all the noise. 😔

@cshadek
Copy link

cshadek commented Jun 22, 2023

So I'm able to repeat the same error when I test an async function that includes a defer with a Task. (See below)

For some reason, when I use the async version of app.test or app.sendRequest there is no longer an error. Xcode still throws a warning because it doesn't think I need the await.

Function being tested:

private func myFunction(req: Request) async throws -> MyResponse {

// Do some work

    defer {
         Task {
             // Some async work that might not be critical (i.e., deleting a row in a database, writing a log record to a database)
         }
    }

    return MyResponse()
}

Non Working Test Version:

let res = try app.sendRequest(.POST, "some/path", beforeRequest: { req in
  try req.content.encode([....])
})
let response = try res.content.decode(MyResponse.self)

In the above version I get Assertion failed: PostgresConnection deinitialized before being closed.

Working Test Version (the only difference is the await):

let res = try await app.sendRequest(.POST, "some/path", beforeRequest: { req in
  try req.content.encode([....])
})
let response = try res.content.decode(MyResponse.self)

I think the issue is that I'm shutting down the application in teardown() async throws {}, and the application is shutting down before defer is called.

Is using the async version of test or sendRequest a valid solution, or are there still potentially errors here? If this is a solution, maybe it would be possible to stop Xcode from throwing a warning?

@finestructure
Copy link
Sponsor Member Author

I think the problem in this case is that you're launching an unstructured concurrency Task at the end of your myFunction with no means to actually properly await its completion before shutting down the db. I'm no expert but I suspect you're just lucky that using the async sendRequest happens to schedule the task in a way that makes it work (for now).

If you need an "async defer", maybe this here can help: https://gist.github.com/finestructure/500d9fda7dfc99400e8ef5bddb38776b. This will tack the defer on to the end of your function and part of its "await", so to speak. You can then leave it up to the caller if you want to "fire and forget" it by wrapping it in an unstructured Task { } rather than baking it into the function itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants