Improve response time latency

Hi there, 
I am recently using fast api for machine learning prediction.
However, the model is loaded once when the server is began, and I communicate with the database only twice before and after prediction, it seems that there is a bottleneck in my code. I tried to check the robustness of my server with sending requests by threads. When there is less than 10 request every thing is fine and the max response time is acceptable. But when I increase the number of thread to about 100, the server response time increase exponentially and it might get even 200 seconds for client  to receive the response. 
CAn wany body have any Idea?
I do use asysnc and await keywords in both when accessing to the database and the prediction function.
The sequential response time is less than 1 second.
How can I improve the performance of my server when it is supposed to process a huge number of requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve response time latency #2603

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve response time latency #2603

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions