-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Description
Hi there,
I am recently using fast api for machine learning prediction.
However, the model is loaded once when the server is began, and I communicate with the database only twice before and after prediction, it seems that there is a bottleneck in my code. I tried to check the robustness of my server with sending requests by threads. When there is less than 10 request every thing is fine and the max response time is acceptable. But when I increase the number of thread to about 100, the server response time increase exponentially and it might get even 200 seconds for client to receive the response.
CAn wany body have any Idea?
I do use asysnc and await keywords in both when accessing to the database and the prediction function.
The sequential response time is less than 1 second.
How can I improve the performance of my server when it is supposed to process a huge number of requests.