Request timed out causing meteor instances to be restarted #753

kaihendry · 2019-04-09T14:45:03Z

https://media.dev.unee-t.com/2019-04-09/request-timed-out.mp4

Meteor instances in production seem to get stuck and fail to respond to health checks.

kaihendry · 2019-04-09T14:53:40Z

@nbiton I'm pretty confident it appears to go down (become unresponsive) in one of those repeated bugzilla ajax retries like mentioned upon #699

Actually I am not sure anymore since the log times and the event don't line up exact.

kaihendry · 2019-04-09T15:06:28Z

We need to improve logging since production doesn't appear to log requests right now. Did we pull out of #631 on prod? Wondering why the logs are not on one line.

https://media.dev.unee-t.com/2019-04-09/case-dev-vs-prod-logging.mp4

kaihendry · 2019-04-10T02:17:54Z

If NodeJS locks up and gets killed, there is a some time before it comes back online. This could be expedited by #750 since this proposal would avoid the npm install stage in our current Docker image.

kaihendry · 2019-04-12T03:14:34Z

We reproduced the problem locally in the sense we see Meteor taking a very long time to process a HTTP.call

I20190412-10:35:46.717(8)? { statusCode: 200,
I20190412-10:35:46.717(8)?   method: 'get',
I20190412-10:35:46.718(8)?   endpoint: '/rest/bug/73397/comment',
I20190412-10:35:46.718(8)?   payload: { api_key: 'secret' },
I20190412-10:35:46.718(8)?   duration: 74517 }
I20190412-10:35:46.722(8)? { statusCode: 200,
I20190412-10:35:46.722(8)?   method: 'get',
I20190412-10:35:46.723(8)?   endpoint: '/rest/bug/73397',
I20190412-10:35:46.723(8)?   payload: { api_key: 'secret' },
I20190412-10:35:46.723(8)?   duration: 74525 }

MEFE says 74s and Bugzilla below says half a second. This is the crux of the problem.

bugzilla_1   | 172.24.0.1 - - [12/Apr/2019:02:25:05 +0000] "GET /rest/bug/70142/comment?api_key=secret HTTP/1.
1" 200 1352 500 "-" "-"
bugzilla_1   | 172.24.0.1 - - [12/Apr/2019:02:25:05 +0000] "GET /rest/bug/70142?api_key=secret HTTP/1.1" 200 3
239 585 "-" "-"

kaihendry · 2019-04-14T23:26:09Z

Since getting things running locally, we know it's not a DevOps issue. It appears to be something to do with how caseNotifications are read causing a huge delay. @nbiton is working on a fix.

franck-boullier · 2019-05-07T03:26:55Z

this shoudl now be fixed thatnks to throttling and the implementation of unee-t/bz-database#129

kaihendry assigned nbiton Apr 14, 2019

franck-boullier closed this as completed May 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request timed out causing meteor instances to be restarted #753

Request timed out causing meteor instances to be restarted #753

kaihendry commented Apr 9, 2019

kaihendry commented Apr 9, 2019 •

edited

Loading

kaihendry commented Apr 9, 2019

kaihendry commented Apr 10, 2019

kaihendry commented Apr 12, 2019

kaihendry commented Apr 14, 2019

franck-boullier commented May 7, 2019

Request timed out causing meteor instances to be restarted #753

Request timed out causing meteor instances to be restarted #753

Comments

kaihendry commented Apr 9, 2019

kaihendry commented Apr 9, 2019 • edited Loading

kaihendry commented Apr 9, 2019

kaihendry commented Apr 10, 2019

kaihendry commented Apr 12, 2019

kaihendry commented Apr 14, 2019

franck-boullier commented May 7, 2019

kaihendry commented Apr 9, 2019 •

edited

Loading