-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconnecting after database server disconnect/reconnect #1833
Comments
+1 to implementing autoreconnect feature: it's really inconvenient not having it out-of-the-box. |
I'm personally not convinced there isn't a way. I have seen tests in external projects that suggest that I am asking this question for these reasons:
|
@ty10r Well, given there's absolutely zero results when searching for I wonder how is it even possible to not build-in (and then document) such a crucial piece of functionality into a library which is being advertised as being made for production use. |
Lets try to keep future discussion of knex reconnecting to database here. Now that we are using generic-pool it is fair to look again how this works. Would be great if knex would be able to connect again to DB if server was closed and restarted (or network was broken for a while or anything like that). Would be awesome if we would be able to add some automatic testing for this. Any ideas how to do them would be valuable (or do we just have to stop / restart db servers in CI? for that separate test script that is ran after all others could be viable solution). |
Hi @elhigu, I leave you some comments: ReconnectionAccording to my tests, reconnection is almost working. I've only found one weird scenario (see this issue).
If what I say is correct, maybe you can save this case by correcting the implementation of TestingIn my tests I am using dockerode to start and stop postgres containers directly from Let me know what you think and maybe I'll make myself some time to help you with a PR. |
@smulesoft if you are able to add PR which actually can test this somehow on our CI it would be very valuable. Problem is that if there is no tests it will break again in the future. I don't have currently resources to write PRs for this stuff by myself. |
Have the same issue.
Tested in postgres dialect. |
Another solution is check for
|
As someone that is seeing this thread for the first time after running into rejected connection promises, could someone please do me a quick favour and summarise the status of the fix? I'm seeing this in my error logs:
I assumed that specifying a Is there some extra configuration that is needed, or something else I'm missing? Thanks in advance. |
@leebenson I believe there is new bug in 0.14 which is causing that error. Maybe related to this #2320 and here is other issue about your problem #2321 |
I saw infinite loop of trying to reconnect in 0.14 when db didn't exist, but it is now fixed in 0.14.2 just as an FYI. |
Removed unrelated question, which was added by accident to this issue. |
This isn't resolved for me in 0.14.2. When I kill a database in between making knex queries, everything works. When I make queries, then scale the database to zero replicas, attempt to make more queries, I get the following error - and I get the same error even when I scale the database back up and it's accepting connections and everything.
So, steps to reproduce:
My connection settings are
EDIT 1: I've tried setting pool.idleTimeoutMillis to 25s, pool.acquireTimeoutMillis to 4s, evictionRunIntervalMillis to 20s, and it still took around 90 seconds until the queries started succeeding. I will test this more. EDIT 2: with timeouts set to 5 seconds and 10 connections in the pool, it took 40 seconds from the time when database could be queried until first query succeeded. With 2 connections in pool, it still took 40-60 second until the first query succeeded, from the time when the database could be queried from another tool. Just to be clear, that's with these settings:
|
Default is 1000 though, which should be fine, |
@kibertoad Maybe, but since the Knex docs didn't mention anything about tarn.js I didn't look at tarn.js. I looked at generic-pool, since that's what the Knex docs mention. Yeah, I doubt it's the cause of this error. |
Yes I am very well aware now that Knex uses tarn. I knew as soon as @ChieveiT pointed it out. I'm not convinced it's related to this issue, but others did have theirs fixed by downgrading, so that's interesting. I still haven't seen the issue again since I last reported it happened. I'm not even 100% convinced it's Knex's fault. |
When I was testing this because we had the same problem in prod. In my case, I came up with an integration test that could reproduce the problem:
I did steps |
@smulesoft Would you consider contributing a test script that would be doing that? |
Yeah, the PR I mentioned, #2017. |
Yeah, AFAIK that shouldn't be the problem anymore. |
@kibertoad I think it has been in already in 0.14 or something like that. Long enough that it was already broken at some point... I wouldn't have it implemented the way it was anymore and I haven't tried that recently. |
@kibertoad |
@smulesoft CockroachDB is actually using pg driver. |
Thanks for the info. Then we should wait until we have a way to reproduce this consistently and then debug the problem... |
I think this issue has too much clutter already from ancient history and different issues, that this should be closed and add new issues when something reproducible happens. Closing. |
As a bit of closure, yesterday I finally worked out what the problem was in my case. It was actually a problem with the database, which has now been resolved with the latest version of CockroachDB that was just released. |
@jazoom Glad to hear that! Would you consider providing a bit more details (e. g. link to resolved issue in CockroachDB tracker) for people with same issue who stumble upon this ticket? |
I actually don't know which issue it was fixed in, or even if it had an issue. I noticed sometimes the metrics would catch a spike in RAM and CPU usage and the CockroachDB logs said something about a heartbeat timeout. I updated to version 2.1.0 (a major update with a great many fixes and changes) and the problem hasn't happened since. But, it's only been 1 day, so I can't be too sure. |
@jazoom I would appreciate it if you reported back after having it run for a while, whether or not problem reappeared again :) |
No probs. Will do. |
It's been 3 days now and I still haven't seen the problem. Hopefully that means it has been resolved, at least in my case. |
@jazoom Given that it was a week by now, is it still going strong? |
Actually, it happened once, but I'm pretty sure it was CockroachDB's fault, or perhaps a brief network outage. I don't think it had anything to do with Knex. |
Hey guys, You can see some of my trouble shooting process over here: Generally I was able to solve this previously (in the last iteration of my modified Ghost docker image) by rebuilding / rolling back to Knex 0.12.6. Since Ghost uses MySql (by default) but some of my services on the cluster also need to connect to Postgres (Rails in this case) — as an added solution / redundancy I added a PGbouncer instance in order to separate any services connecting to external DBs from the connection pooling issue by enforcing a DB connection drop on the PGbouncer instance (prior to Azure's 4 minute Load Balancer automatic connection severing). If you are using Postgres and not MySQL you can solve the problem by adding PGBouncer in the middle and telling it to cycle connections frequently. I don't know if the issue is still occurring but I am open to testing it with the latest version used by the official Ghost repo (Knex 0.14.6) if you guys have a test procedure! |
I have My pool options are:
It seems if the connection in the pool constantly gets kept alive through querying activity, it is fine. However, if the query sits idle... there is no kind of keep alive? Any advice on how to solve this issue? A keep alive setting? Retry on |
@brandonros if it is idle, then connections may be closed by the server or by pool and when new query is done, new connection is fetched to the pool if necessary... So no need to keepalive... you should never get |
@brandonros did you ever solve the issue? |
This comment has been minimized.
This comment has been minimized.
@brandonros what was your solution? |
I searched through the issues to find if others are having trouble attempting to reconnect.
It's been a very long time since these two:
#1341
#936
Most of the suggestions I have seen in the issues so far are relying on
pool2
and include configuration forrequestTimeout
.I am on version
0.12.2
of knex, which no longer usespool2
. I also can no longer see implementation ofping
insrc/client.js
.I'm running tests where I use docker to run these steps:
Step 7 is failing in the same way step 5 does.
I assumed knex would automatically attempt to reconnect. Is there a way I can cause a reconnection?
If you'd like, I can post this question instead at one of the past conversations on reconnection. I thought it was worth having a new conversation since most of the other conversations include only answers from before switching to
generic-pool
.The text was updated successfully, but these errors were encountered: