Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Request Timeouts when in reality, queries are run instantly when run directly on the Database #1407

Closed
Blutude opened this issue Jul 7, 2022 · 14 comments

Comments

@Blutude
Copy link

Blutude commented Jul 7, 2022

We are getting intermittent database errors in PROD. We are able to reproduce them in different environments and here it the full stack trace.

RequestError: Timeout: Request failed to complete in 15000ms
 at Request.userCallback (/home/node/app/node_modules/@hobt/database-client/node_modules/mssql/lib/tedious/request.js:437:19)
 at Request.callback (/home/node/app/node_modules/tedious/lib/request.js:56:14)
 at Connection.endOfMessageMarkerReceived (/home/node/app/node_modules/tedious/lib/connection.js:2407:20)
 at Connection.dispatchEvent (/home/node/app/node_modules/tedious/lib/connection.js:1279:15)
 at Parser.<anonymous> (/home/node/app/node_modules/tedious/lib/connection.js:1072:14)
 at Parser.emit (events.js:314:20)\n at Parser.<anonymous> (/home/node/app/node_modules/tedious/lib/token/token-stream-parser.js:37:14)
 at Parser.emit (events.js:314:20)
 at addChunk (/home/node/app/node_modules/readable-stream/lib/_stream_readable.js:298:12)
 at readableAddChunk (/home/node/app/node_modules/readable-stream/lib/_stream_readable.js:280:11)

This INSERT request gets retried for 24 hours and does not succeed. We tried running our exact query straight into the Database Query Editor and it gets executed quickly, way under 15 seconds. We analyzed the query performance and it all looks normal, no indication to why it would take over 15 seconds when getting run from the Tedious package. We got support from Microsoft to look into the Database Server and everything seems to look healthy. They could not pinpoint any issue on the Database server that would cause this issue. We are not exhausting any of the connections, the database is not under any stress. Microsoft Support were suggesting that the problems resided on the Tedious package that we are using to connect to MSSQL.

While those database requests were getting retried and failing with those timeouts, we tried restarting the entire service. As soon as the service starts up, it retries those database requests and we get the same failures. So that is not too clear why those issues are also happening during startups, this is leading me to think it might not be an issue with the tedious package (because I would have expected things to work on startup otherwise). So I would really appreciate support in figuring out why we could be getting these request timeouts.

Expected behaviour:

I would expect the intermittent errors to eventually succeed on retries.

Actual behaviour:

The requests keep failing and do not eventually work. Although on other days, this request works with no issues.

Configuration:

max pool size: 50
request timeout: 15000

Software versions

  • NodeJS: 12.22.12
  • tedious: 6.6.5
  • mssql: 6.0.1
  • node-mssql:
  • SQL Server: SQL Azure 12.0.2000.8
@dhensby
Copy link
Collaborator

dhensby commented Jul 7, 2022

You're using some old versions of MSSQL/tedious so my first recommendation is to upgrade the package and see if you still have the problem. I'm not providing support for v6 of the library as it stands.

In terms of digging deeper, I think I'll need to see some application code/query examples as that may well be fundamental to the problem.

If the errors with the queries happen repeatedly with the same query, even after application restarts, that doesn't seem intermittent but systemic to the query being run. Can you give an example?

When one runs queries directly against a server using a query editor, the set up is not always identical, there can be settings for the connections that are made which aren't set in tedious. I suspect there's something about the query that isn't playing nice (maybe unescaped quote marks, or some syntax that's not compliant unless in ANSI mode, etc).

@Blutude
Copy link
Author

Blutude commented Jul 8, 2022

I tried upgrading mssql/tedious but that requires us to upgrade node, typescript and so many other packages, which is in turn even requiring a lot of code changes. I tried doing that quickly yesterday but after I saw the impact would be quite big, I decided to leave this option for last resort. We are currently in production and cannot afford to do that many upgrades at this point just resolving this P1 issue we have. If we are confident that this is the issue and that it cannot be fixed otherwise, we will go that route but I am trying to avoid this for now.

We have tried enabling an SQL Server Profiler while having this query being executed through the tedious package, and it seems that the Profiler is not even seeing these queries make it to the SQL Server. We are running the same submission with test data in a test environment multiple times and it seems that the same query sometimes succeeds and sometimes fails. When it fails, we retry for 24 hours with no success. On other days, it starts working again so I feel it is not an issue with the query. I am deploying a log to print the full query details whenever this issue happens again so we will keep an eye on it but as mentioned, our indications seem to show it is not an issue related to the query.

Something is preventing the query to even make it to the SQL Server. So I am wondering what are the possible reasons for this package to throw this RequestError "Timeout: Request failed to complete in" without even making a request to the actual database server.

@dhensby
Copy link
Collaborator

dhensby commented Jul 9, 2022

Interesting that the request isn't even making it through. I'd normally suspect that the pool was depleted, but if the requests still fail after a restart, then the pool isn't the problem (which is why I still suspect a query based problem, or possibly code).

I can't provide anything but speculation without seeing code and/or some reproduction steps.

You could turn the debug logging on (set env var DEBUG=* to see all the debug logs, which might reveal where the requests are stalling.

If you can't upgrade to the latest MSSQL version, can you upgrade to the latest v6? v6.0.1...v6.4.1

@Blutude
Copy link
Author

Blutude commented Jul 13, 2022

I do have a query that did fail. I tried running it locally through the tedious package but it was failing on a separate error, that I was using too many request input parameters. The query is a bulk insert where we insert 500 records. So I split up the query into chunks of 100 and it worked locally. However I was not able to reproduce the exact same error of the Request Timeout locally, even though I am pointing to the deployed database that was throwing this error. So this tells me the issue might not be related to the query.

I thought maybe the package is just not displaying the right error in a message, and that the true issue is that we are using too many request input parameters. So I figured we just need to split of our large bulk query in chunks.

Then this Request Timeout error happened on another separate query that is only updating a couple fields from 3 records (so not a bulk query) and I ran the exact same query locally, pointing to the same database but was not able to reproduce it. It is consistently failing on the deployed instance.

So for our next steps, I did upgrade mssql to v6.4.1 and I did add some logs on the query for whenever it fails so we can take a look at the failing query as well as the full stack trace. I'm just not confident that I would be able to reproduce it locally.

Again the queries when being run straight on the database (without going through node) execute instantly and the performance analyzation of those queries look good. Most of what we're seeing seems to indicate that there is something wrong on the Database Server (especially since when we redeploy, this timeout still happens), but when we run the profiler we do not see those requests even make it to the Server.

@dhensby
Copy link
Collaborator

dhensby commented Aug 8, 2022

@Blutude did you get to the bottom of your problem? If you're still experiencing it, it may be worth not only debugging the queries that are failing, but also the state of the pool at that point too (is it depleted? healthy? are the connections being released? etc). Also, is there a mismatch between the number of connections the pool is allowed to create and the servers maximum number of connections?

You can inspect the tarn pool directly or use some helpers on the pool: https://github.com/tediousjs/node-mssql/blob/v6.4.1/lib/base/connection-pool.js#L255-L269 and https://github.com/tediousjs/node-mssql/blob/v6.4.1/lib/base/connection-pool.js#L74-L84

@MarcoEstevez

This comment was marked as off-topic.

@MarcoEstevez

This comment was marked as off-topic.

@dhensby

This comment was marked as off-topic.

@dhensby

This comment was marked as off-topic.

@SteadEXE
Copy link

SteadEXE commented Apr 5, 2023

I don't know if it's related to the same issue, but we also have request timeout while it works in direct with SQL Server Manager.
We are using MSSQL 9.1.1.

When we got this issue, it blocks the whole event loop and the whole application becomes unresponsive (CPU usage rise up to 100%).

@dhensby
Copy link
Collaborator

dhensby commented Apr 5, 2023

@SteadEXE - as this issue relates to a much older version of the library, I'd recommend you open a new issue. Please can you include any debugging and where you've determined the hanging is happening. It's very unusual that the event loop would be blocked, but it would be good to get to the bottom of if it's being caused by this library.

@SteadEXE
Copy link

SteadEXE commented Apr 5, 2023

Sure, I'll try to investigate, and open an issue if I can get the needed informations :)
thanks for the reactivity

@oliveiracdz
Copy link

Hi there! I got the same behavior on my next.js 14 with mssql 10.0.2 application.
Do you have any advices? Thank you!

@oliveiracdz
Copy link

In my case the problem was that SQL Server was choosing bad plans for query execution.
We fixed it forcing the plan on Query Store: https://stackoverflow.com/a/56774964/1440095

@dhensby dhensby closed this as completed Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants