This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale FastApi with sync endpoints #5759
Comments
Notes for the reader:
Concurrency:
Parallelism:
In your example you are using the workers, and the issue that is occurring is that the work is not balanced of the workers. According to the documentation:
And:
if an application is CPU bound ("resource bound") than workers are ideal, if an application is I/O bound threads or pseudo threads are ideal. |
|
@extreme4all because we're using not asyncio compatible library (a snowflake driver of sqlalchemy |
In addition to what @extreme4all posted
This is not always true, Unicorn as the name implies will use uvloop depending on how it is configured / installed Another thing to consider is that FastAPI/Starlette are spawning those def endpoints onto a threadpool, depending on its implementation that could be where you are seeing a bottle neck. Pythons threads/process story can be difficult to reason about. In the short term depending on the machine you have deployed this on Gunicorn will recommend 2 workers per core. However if the cores are not busy you can probably up the worker count and that may solve your problem / give you some breathing room while you investigate further. |
@jgould22 I know that def endpoints are spawned with threadpool threads, that's why I suspect we experience low latencies due to GIL limitations with many threads on the same worker (the amount of threads is not limited on each worker) |
The best approach for a scalable deployment (in containerized environments) is using barebones Uvicorn with single worker and scale your containers horizontally. You can have health check APIs to ensure your containers don't go stale. |
@iudeen I'd love hearing where did you find this to be the best approach |
@talboren based on hands-on experience on a product that has hundreds of thousands concurrent users and in a way from FastAPI docs as well: https://fastapi.tiangolo.com/deployment/docker/#replication-number-of-processes |
This is not true. The question was already answered on the first comment: "Gunicorn relies on the operating system to provide all of the load balancing when handling requests." This is on Ref.: https://docs.gunicorn.org/en/latest/design.html?highlight=operating%20system#how-many-workers |
@Kludex The uvicorn workers don't limit the threads that the fastapi app creates, when using def endpoints. As I stated at the beginning of the thread, when a new connection is accepted, it's passed to the fastapi app which creates a new thread using AnyIO library, and then the event loop of the uvicorn worker gets available right away to accept new connections. |
Where does the uvicorn worker limits the threads? Also, I didn't get the point of the previous message. Can you rephrase it? |
It doesn't limit, forgot a word 😅 |
Uvicorn doesn't limit, but AnyIO does. There's a limit of 40 threads by default. ...but the above is not relevant to this discussion. Gunicorn is in charge of the load balancing, as mentioned above - and this is the relevant part here. |
But gunicorn doesn't do it. It's documented -
So, with def endpoints, threads creation allows the same worker to get all the connection, so I suspect that this is what causes to the the workers being unbalanced |
It is still in charge of load balancing among its workers. It is just that the implementation chooses to delegate that to the OS (if the docs are correct) @Kludex If AnyIO sets a default limit on the thread pool is there a mechanism to adjust that? Or does that AnyIO limit effectively cap FastAPI/Starlette at handling 40 requests ( assuming threads are being spawned for nothing else I guess ) per FastAPI process / Application and then any following requests to AnyIO will queue up in the thread pool implementation ? |
@jgould22 Yes: encode/starlette#1724 (comment). btw, thanks for helping on the FastAPI issues. 🙏 In any case, I've already shared my thoughts here: the problem, if there's a problem, it's not something FastAPI can do something about. The description of the issue shows a problem on the load balancing, and the tool in charge of that is |
@Kludex Tried already to use this method to limit the threads number - that didn't help to spread the requests on other uvicorn workers.
The first two requests arrived to the same worker, and the third wasn't handled since it arrived also to the first worker, but then got stuck until one of the first requests finished. That's also the behavior when running directly uvicorn with 4 workers. |
Adjusting the number of workers etc is not going to have an affect on this. The way Gunicorn works is by opening a socket and then loading the application. Then it forks the process which produces one of its workers. To start workers it forks again until it has the desired number. This produces a set of processes that are all blocking on |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
First Check
Commit to Help
Example Code
Description
I've noticed lately that we have some latency problems when the servers are busy. I dived into it and found out that, if for example, I have four uvicorn workers, while one worker is very busy, the rest three are significantly less busier. That has two problems -
In the example code, if I run it with
gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
, the first five requests will result in the following output -I investigated uvicorn workers and found out they use asyncio server interface, which accepts new connections on python's event loop, which means that as long we're using not-async endpoints (which run on AnyIo threads), there is no limit on the number of new connections a worker gets, which again takes us back to the GIL problem.
To sum it all up, I'm a little confused about what is my best option for using FastApi with not-async endpoints and get better performance. Should I use more workers? Should I use more container instances?
Maybe we should implement a new worker which able to control better multithreading on such cases ? I have some implementation ideas for this purpose, but I’d like to have your advice here.
Operating System
Linux
Operating System Details
FastAPI Version
0.88.0
Python Version
3.9.4
Additional Context
The text was updated successfully, but these errors were encountered: