-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
Description
Some background
I am not sure if the following details are required to reply to my question but providing them just for reference. I am developing a FastAPI POST endpoint which does the following:
Receives an image file (multipart/form-data)
Converts the file stream to a numpy array.
Processes the image using open CV and tensorflow
The API is deployed on Google App Engine Flex (1-4 cores, 1-4 workers), using gunicorn and uvicorn. When I am calling an endpoint the average response time is ~1 seconds. However, when I am load testing the API with 5 concurrent users the average response time goes to ~4-5 seconds per request and then some uvicorn workers start to timeout and get killed. As a result I receive 502 responses from the nginx server.
My question is the following:
Since the default timeout of uvicorn is 30 seconds, I don't undertand why the workers are being terminated before they require ~30 seconds to respond. I have read that the 30 seconds timeout is not per request. However I do not understand under which circumstances a worker is supposed to timeout. For instance, in case all workers manage to respond within 5-10 seconds, is it normal to see workers timeout while the timeout threshold is 30?