Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale FastApi with sync endpoints #5759

Closed
9 tasks done
dapollak opened this issue Dec 8, 2022 · 19 comments
Closed
9 tasks done

Scale FastApi with sync endpoints #5759

dapollak opened this issue Dec 8, 2022 · 19 comments
Labels
question Question or problem question-migrate

Comments

@dapollak
Copy link

dapollak commented Dec 8, 2022

First Check

  • I added a very descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the FastAPI documentation, with the integrated search.
  • I already searched in Google "How to X in FastAPI" and didn't find any information.
  • I already read and followed all the tutorial in the docs and didn't find an answer.
  • I already checked if it is not related to FastAPI but to Pydantic.
  • I already checked if it is not related to FastAPI but to Swagger UI.
  • I already checked if it is not related to FastAPI but to ReDoc.

Commit to Help

  • I commit to help with one of those options 👆

Example Code

import logging
from fastapi import FastAPI

app = FastAPI()
logger = logging.getLogger()

@app.get("/")
def root():
    logger.info(f"Running on {os.getpid()}")
    time.sleep(3600)
    return {"message": "Hello World"}

Description

I've noticed lately that we have some latency problems when the servers are busy. I dived into it and found out that, if for example, I have four uvicorn workers, while one worker is very busy, the rest three are significantly less busier. That has two problems -

  1. We're not taking advantage of all our parallelism power
  2. We're suffering more from Python's GIL problems on the same worker.

In the example code, if I run it with gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000, the first five requests will result in the following output -

INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 642

I investigated uvicorn workers and found out they use asyncio server interface, which accepts new connections on python's event loop, which means that as long we're using not-async endpoints (which run on AnyIo threads), there is no limit on the number of new connections a worker gets, which again takes us back to the GIL problem.

To sum it all up, I'm a little confused about what is my best option for using FastApi with not-async endpoints and get better performance. Should I use more workers? Should I use more container instances?
Maybe we should implement a new worker which able to control better multithreading on such cases ? I have some implementation ideas for this purpose, but I’d like to have your advice here.

Operating System

Linux

Operating System Details

FastAPI Version

0.88.0

Python Version

3.9.4

Additional Context

@dapollak dapollak added the question Question or problem label Dec 8, 2022
@extreme4all
Copy link

Notes for the reader:

  • Concurrency is when 2 or more tasks are being performed at the same time, which might mean that only 1 of them is being worked on while the other ones are paused.

  • Parallelism is when 2 or more tasks are executing at the same time.

Concurrency:

  • gunicorn: --workers, spawn different processes
  • gunicorn: --threads, per worker spawn threads, threads share memory in the same worker
  • Async: pseudo threads, asyncio uses one thread & one process, and allows concurrency by jumping between coroutines (async functions)

Parallelism:

  • gunicorn: --workers
  • gunicorn: --threads

In your example you are using the workers, and the issue that is occurring is that the work is not balanced of the workers.

According to the documentation:

The default synchronous workers assume that your application is resource-bound in terms of CPU and network bandwidth.

And:

Gunicorn relies on the operating system to provide all of the load balancing when handling requests.

if an application is CPU bound ("resource bound") than workers are ideal, if an application is I/O bound threads or pseudo threads are ideal.

@extreme4all
Copy link

  • is there any reason why async is not an option?

@dapollak
Copy link
Author

dapollak commented Dec 9, 2022

@extreme4all because we're using not asyncio compatible library (a snowflake driver of sqlalchemy

@jgould22
Copy link

jgould22 commented Dec 9, 2022

In addition to what @extreme4all posted

I investigated uvicorn workers and found out they use asyncio server interface, which accepts new connections on python's event loop

This is not always true, Unicorn as the name implies will use uvloop depending on how it is configured / installed

Another thing to consider is that FastAPI/Starlette are spawning those def endpoints onto a threadpool, depending on its implementation that could be where you are seeing a bottle neck.

Pythons threads/process story can be difficult to reason about.

In the short term depending on the machine you have deployed this on Gunicorn will recommend 2 workers per core. However if the cores are not busy you can probably up the worker count and that may solve your problem / give you some breathing room while you investigate further.

@dapollak
Copy link
Author

@jgould22 I know that def endpoints are spawned with threadpool threads, that's why I suspect we experience low latencies due to GIL limitations with many threads on the same worker (the amount of threads is not limited on each worker)

@iudeen
Copy link
Contributor

iudeen commented Dec 11, 2022

The best approach for a scalable deployment (in containerized environments) is using barebones Uvicorn with single worker and scale your containers horizontally.

You can have health check APIs to ensure your containers don't go stale.

@talboren
Copy link

The best approach for a scalable deployment (in containerized environments) is using barebones Uvicorn with single worker and scale your containers horizontally.

You can have health check APIs to ensure your containers don't go stale.

@iudeen I'd love hearing where did you find this to be the best approach

@iudeen
Copy link
Contributor

iudeen commented Dec 11, 2022

@talboren based on hands-on experience on a product that has hundreds of thousands concurrent users and in a way from FastAPI docs as well: https://fastapi.tiangolo.com/deployment/docker/#replication-number-of-processes

@Kludex
Copy link
Sponsor Collaborator

Kludex commented Dec 11, 2022

there is no limit on the number of new connections a worker gets

This is not true.

The question was already answered on the first comment:

"Gunicorn relies on the operating system to provide all of the load balancing when handling requests."

This is on gunicorn.

Ref.: https://docs.gunicorn.org/en/latest/design.html?highlight=operating%20system#how-many-workers

@dapollak
Copy link
Author

dapollak commented Dec 11, 2022

@Kludex The uvicorn workers don't limit the threads that the fastapi app creates, when using def endpoints. As I stated at the beginning of the thread, when a new connection is accepted, it's passed to the fastapi app which creates a new thread using AnyIO library, and then the event loop of the uvicorn worker gets available right away to accept new connections.

@Kludex
Copy link
Sponsor Collaborator

Kludex commented Dec 11, 2022

Where does the uvicorn worker limits the threads?

Also, I didn't get the point of the previous message. Can you rephrase it?

@dapollak
Copy link
Author

dapollak commented Dec 11, 2022

Where does the uvicorn worker limits the threads?

Also, I didn't get the point of the previous message. Can you rephrase it?

It doesn't limit, forgot a word 😅
The point is that the combination of unbalanced load on the worker and "unlimited" threads per is bad due to both GIL problems and not using all workers

@Kludex
Copy link
Sponsor Collaborator

Kludex commented Dec 11, 2022

Uvicorn doesn't limit, but AnyIO does. There's a limit of 40 threads by default.

...but the above is not relevant to this discussion.

Gunicorn is in charge of the load balancing, as mentioned above - and this is the relevant part here.

@dapollak
Copy link
Author

But gunicorn doesn't do it. It's documented -

Gunicorn relies on the operating system to provide all of the load balancing when handling requests.

So, with def endpoints, threads creation allows the same worker to get all the connection, so I suspect that this is what causes to the the workers being unbalanced

@jgould22
Copy link

It is still in charge of load balancing among its workers.

It is just that the implementation chooses to delegate that to the OS (if the docs are correct)

@Kludex If AnyIO sets a default limit on the thread pool is there a mechanism to adjust that? Or does that AnyIO limit effectively cap FastAPI/Starlette at handling 40 requests ( assuming threads are being spawned for nothing else I guess ) per FastAPI process / Application and then any following requests to AnyIO will queue up in the thread pool implementation ?

@Kludex
Copy link
Sponsor Collaborator

Kludex commented Dec 11, 2022

If AnyIO sets a default limit on the thread pool is there a mechanism to adjust that?

@jgould22 Yes: encode/starlette#1724 (comment).

btw, thanks for helping on the FastAPI issues. 🙏

In any case, I've already shared my thoughts here: the problem, if there's a problem, it's not something FastAPI can do something about. The description of the issue shows a problem on the load balancing, and the tool in charge of that is gunicorn.

@dapollak
Copy link
Author

dapollak commented Dec 12, 2022

@Kludex Tried already to use this method to limit the threads number - that didn't help to spread the requests on other uvicorn workers.
I added this to my example -

@app.on_event("startup")
async def startup_event():
    limiter = anyio.to_thread.current_default_thread_limiter()
    limiter.total_tokens = 2

The first two requests arrived to the same worker, and the third wasn't handled since it arrived also to the first worker, but then got stuck until one of the first requests finished.

That's also the behavior when running directly uvicorn with 4 workers.

@jgould22
Copy link

@Kludex Tried already to use this method to limit the threads number - that didn't help to spread the requests on other uvicorn workers. I added this to my example -

@app.on_event("startup")
async def startup_event():
    limiter = anyio.to_thread.current_default_thread_limiter()
    limiter.total_tokens = 2

The first two requests arrived to the same worker, and the third wasn't handled since it arrived also to the first worker, but then got stuck until one of the first requests finished.

That's also the behavior when running directly uvicorn with 4 workers.

Adjusting the number of workers etc is not going to have an affect on this.

The way Gunicorn works is by opening a socket and then loading the application. Then it forks the process which produces one of its workers. To start workers it forks again until it has the desired number. This produces a set of processes that are all blocking on accept() on the same socket. When a request comes in the linux scheduler decides which process gets to wake up and serve the request. How it decides that it sorta outside the scope of FastAPI.

@raphaelauv
Copy link
Contributor

-> #3091 (comment)

Repository owner locked and limited conversation to collaborators Feb 28, 2023
@tiangolo tiangolo converted this issue into discussion #8433 Feb 28, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question Question or problem question-migrate
Projects
None yet
Development

No branches or pull requests

8 participants