Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Connections #564

Closed
proggeramlug opened this issue Aug 24, 2018 · 27 comments
Closed

Multiple Connections #564

proggeramlug opened this issue Aug 24, 2018 · 27 comments

Comments

@proggeramlug
Copy link
Contributor

Disclaimer: This may not just be a fluent thing but also DatabaseKit and all respective drivers.

When trying to run multiple Database requests asynchronously on the same connection a number of problems can/will occur:

  1. The application freezes
  2. The application spits out a rather strange error (State of connection is still callback I think)

I'm going to put together a project that will illustrate the problem but I think this is something that should be solved since the whole charm of asynchronous-ness rests on this.

If it is not possible to have more than one query simultouznious.on a given connection (I'm not sure how each database is handling this) than could a potential solution be to have multiple connections open at the same time in order for this to work well? If so, how could this be abstracted out of the business logic?

@jdmcd
Copy link
Member

jdmcd commented Aug 24, 2018

Quite a few people have observed this and I don’t think anyone has been able to produce a reliable test case. I think the community would be very appreciative if you are able to make one :)

@MrMage
Copy link
Contributor

MrMage commented Aug 24, 2018

If it is not possible to have more than one query simultouznious.on a given connection (I'm not sure how each database is handling this) than could a potential solution be to have multiple connections open at the same time in order for this to work well? If so, how could this be abstracted out of the business logic?

I think Vapor has some classes to pool connections. Maybe requesting a connection from the pool for the duration of just that single request would work? (Did not look into this in detail, though.)

@proggeramlug
Copy link
Contributor Author

@mcdappdev I actually have one and it is very reliable - however it is part of a big closed-source project, so I need to see how to extract it ;)

@MrMage I have not looked into that quite yet - I'm more interested in confirming what the actual problem is. However it may be a work around.

I also noticed today that this problem seems to happen more on linux than on mac for some reason.

@jdmcd
Copy link
Member

jdmcd commented Aug 24, 2018

@proggeramlug Cool! Out of curiosity, how are you getting the connection?

@proggeramlug
Copy link
Contributor Author

@mcdappdev What do you mean? Which connection?

@jdmcd
Copy link
Member

jdmcd commented Aug 24, 2018

The connection you're using to run the database requests concurrently:

When trying to run multiple Database requests asynchronously on the same connection

@proggeramlug
Copy link
Contributor Author

proggeramlug commented Aug 24, 2018

@mcdappdev So far I've been using the standard one coming from Request, like I have not (had to) mess(ed) with that yet.

This problem really only came into existence rather recently.

@jdmcd
Copy link
Member

jdmcd commented Aug 24, 2018

Ok cool, that's good to know. I know some people have run into problems in the past while manually getting connections with newConnection.

@proggeramlug
Copy link
Contributor Author

@mcdappdev (and others)

Okay, so I have finally found the (root) problem of all of this.

The problem is that some machines only have one logical core. The number of max connections per pool is defined as System.coreCount (https://github.com/vapor/database-kit/blob/master/Sources/DatabaseKit/ConnectionPool/DatabaseConnectionPoolConfig.swift#L5).

Most Macs have Quad core processors, meaning 4 cores (if not more), the linux machine I was testing on however only has 1 logical core. However, it is a pretty powerful machine from AWS, like I would not say that this machine is "incapable" of running a complex vapor app.

However, with only 1 logical core it is impossible to have 2 asynchronous database calls. Because the first one gets send and before it returns (and releases the connection) the second one wants to send. But it then doesn't get one and just waits. However, the other connection does not get properly released because it is waiting for the whole operation to be finished (due to flatten).

Consequently the only allowed and active connection is idling while the other requests are waiting for it to be finish: Never-ending loop and the requests timeout.

I am currently testing how multiple connections on the same core work out. I would imagine that there could be some threading complications but my first initial tests all worked out.
Another solution could be to release connections sooner than just to wait for auto-release (which flatten does not trigger afaik).

@tanner0101 your input? :)

@proggeramlug
Copy link
Contributor Author

proggeramlug commented Aug 27, 2018

@mcdappdev This also explains why a newConnection would have the same problem - because if only 1 connection is allowed you always run into this.

@vzsg
Copy link
Member

vzsg commented Aug 27, 2018

So it seems the max number of connections being the coreCount takes another casualty. Can vapor/database-kit#47 get finally merged and tagged @tanner0101?


But then, there's this other issue with the second connection attempt not waiting properly, so given a high enough load (N * M + 1 parallel requests where N is the number of cores and M is the max connection limit), the whole thing will fall apart again...

@MrMage
Copy link
Contributor

MrMage commented Aug 27, 2018 via email

@jdmcd
Copy link
Member

jdmcd commented Aug 27, 2018

@proggeramlug Yeah, I should have noticed this was a result of that logical core issue. Sorry about that - the root of the problem will be fixed when that PR is merged and tagged

@proggeramlug
Copy link
Contributor Author

I'm just glad I found the issue guys :D 2 sleepless nights pretty much :D

@MrMage I tend to agree with you. There is value in considering core-count for total number of connections but at the same time connections are not known to be the busiest threads - so in that sense it makes no sense. Variable would be my preference as well.

@mcdappdev No worries. :)

@MrMage
Copy link
Contributor

MrMage commented Aug 27, 2018

@proggeramlug note that the total number of connections would still scale with core count, because the number of event loops does. But I don’t think that we need to scale the number of connections quadratically with core count — having a constant number of connections available per event loop would be more natural.

@proggeramlug
Copy link
Contributor Author

@MrMage you know more about the event loops than I do so I take your word for it. ;)

@MrMage
Copy link
Contributor

MrMage commented Aug 27, 2018

MrMage added a commit to MrMage/database-kit that referenced this issue Aug 27, 2018
See vapor/fluent#564 (comment) for discussion. Open to changing the exact value of the constant.
@MrMage
Copy link
Contributor

MrMage commented Aug 27, 2018

Filed
vapor/database-kit#48.

@MrMage
Copy link
Contributor

MrMage commented Aug 27, 2018

Duh, I should have looked at
vapor/database-kit#47 first, sorry.

@tanner0101
Copy link
Member

But then, there's this other issue with the second connection attempt not waiting properly, so given a high enough load (N * M + 1 parallel requests where N is the number of cores and M is the max connection limit), the whole thing will fall apart again...

Yeah, I think this will be a problem. Although I'm not sure what the best solution is.

/cc @vzsg

@MrMage
Copy link
Contributor

MrMage commented Aug 27, 2018

Sorry, but could you guys elaborate why exactly the first connection waits for the second one to finish before getting released (and thus returned to the pool). @proggeramlug, you mentioned something about `flatten, how does that come into play here?

@proggeramlug
Copy link
Contributor Author

@MrMage I'm not the expert here and this is mostly based on me looking for this error, so @tanner0101 may need to correct me. But for what I understand the connection is automatically released when the connection instance is no longer needed.

In a flatten Future multiple Futures are combined in a way so that they are executed simultaneously (which I think is up to NIO). So as long as all of them are not done yet they are also not released .. and therefor it blocks.

That's my understanding, again, not 100% sure about it.

@MrMage
Copy link
Contributor

MrMage commented Aug 27, 2018

To me it looks like the connection should be released right once the DB call has finished, but I might be missing something here:

https://github.com/vapor/database-kit/blob/7a01659316b9f033fa2150d5cd5e9d3c3e46c2e3/Sources/DatabaseKit/ConnectionPool/DatabaseConnectionPool.swift#L61

@proggeramlug
Copy link
Contributor Author

@MrMage I'm not sure that this is the call that is being used. Because Fluent will create connections by demand, so currently there is always at least one connection. That connection is not released until the flatten-method is finished for what I gather.

@tanner0101
Copy link
Member

tanner0101 commented Aug 27, 2018

Spent some time digging into this and I think I've found the root cause. The problem happens when withPooledConnection is used alongside requestCachedConnection. Some middlewares, like SessionMiddleware, get their connections using withPooledConnection. On the other hand, all Fluent queries created with Model.query(on: req) use requestCachedConnection. The problem is that once requestCachedConnection gets a connection from the pool, it does not release it until the request deinits. So, if your connection pool has a max of 1 connection, and your route closure performs a Fluent query (calls requestCachedConnection) your connection pool is now exhausted. This is fine if any other calls to requestCachedConnection are made, since the connection is cached. However, if you make a call to withPooledConnection, it will hang forever since the pool is exhausted. Until the request deinits and releases the connection, which will never happen if the request is waiting on that call to withPooledConnection.

This is exactly what happens if you use SessionMiddleware + a call to Fluent using Model.query(on: req). The Fluent call requests and caches a connection from the pool. Then, when the request is "leaving" your app through the middlewares, SessionMiddleware calls withPooledConnection to get a connection required for figuring out if a cookie should be left on the response. If max connections is set to 1, this will create an infinite hang.

I discussed some solutions in Discord, and it's seeming like the best option is to stop using requestCachedConnection since it doesn't play nicely with the other means of using connection pools. The problem is that changing how Fluent deals with connections would definitely be a breaking change (even if it doesn't break any code on the users' end, it could break things that integrate tightly with Fluent). Because of this, we likely need to wait until Fluent 4 to apply a proper fix to this problem. In the meantime, ensuring that connection pools have more than 1 max connections seems to be the best solution.

@proggeramlug
Copy link
Contributor Author

@tanner0101 Thanks for digging into this - I think it would be well worth it to find a proper solution though. Like maybe if you (as the main architect) lay out a process that would fix this for a Fluent 4 version and then I'm sure some community members (myself including) would be willing to work on this.

That being said, I know how limited time is ... all too well!

@Joebayld
Copy link

Joebayld commented Aug 29, 2018

I'm noticing that I'm still able to cause my app to hang even when setting a higher number of maxConnections. After doing a bit of digging - I found something that could be related. In the DatabaseConnectionPool, I added some print commands to see whats going on and discovered that the 'waiters' array isn't always getting the missing connections.

The function below isn't always called.. somehow the waiters array is getting emptied before this.

if let waiter = waiters.popLast() {
  print("Removed Waiter:", waiter); // not always called..
  requestConnection().cascade(promise: waiter);
}

To really test this, I'm firing off a bunch of requests at once to my app to see how it handles connections rapidly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants