Scale FastApi with sync endpoints #5759

dapollak · 2022-12-08T14:01:54Z

First Check

I added a very descriptive title to this issue.
I used the GitHub search to find a similar issue and didn't find it.
I searched the FastAPI documentation, with the integrated search.
I already searched in Google "How to X in FastAPI" and didn't find any information.
I already read and followed all the tutorial in the docs and didn't find an answer.
I already checked if it is not related to FastAPI but to Pydantic.
I already checked if it is not related to FastAPI but to Swagger UI.
I already checked if it is not related to FastAPI but to ReDoc.

Commit to Help

I commit to help with one of those options 👆

Example Code

import logging
from fastapi import FastAPI

app = FastAPI()
logger = logging.getLogger()

@app.get("/")
def root():
    logger.info(f"Running on {os.getpid()}")
    time.sleep(3600)
    return {"message": "Hello World"}

Description

I've noticed lately that we have some latency problems when the servers are busy. I dived into it and found out that, if for example, I have four uvicorn workers, while one worker is very busy, the rest three are significantly less busier. That has two problems -

We're not taking advantage of all our parallelism power
We're suffering more from Python's GIL problems on the same worker.

In the example code, if I run it with gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000, the first five requests will result in the following output -

INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 642

I investigated uvicorn workers and found out they use asyncio server interface, which accepts new connections on python's event loop, which means that as long we're using not-async endpoints (which run on AnyIo threads), there is no limit on the number of new connections a worker gets, which again takes us back to the GIL problem.

To sum it all up, I'm a little confused about what is my best option for using FastApi with not-async endpoints and get better performance. Should I use more workers? Should I use more container instances?
Maybe we should implement a new worker which able to control better multithreading on such cases ? I have some implementation ideas for this purpose, but I’d like to have your advice here.

Operating System

Linux

Operating System Details

FastAPI Version

0.88.0

Python Version

3.9.4

Additional Context

The text was updated successfully, but these errors were encountered:

extreme4all · 2022-12-09T13:48:31Z

Notes for the reader:

Concurrency is when 2 or more tasks are being performed at the same time, which might mean that only 1 of them is being worked on while the other ones are paused.
Parallelism is when 2 or more tasks are executing at the same time.

Concurrency:

gunicorn: --workers, spawn different processes
gunicorn: --threads, per worker spawn threads, threads share memory in the same worker
Async: pseudo threads, asyncio uses one thread & one process, and allows concurrency by jumping between coroutines (async functions)

Parallelism:

gunicorn: --workers
gunicorn: --threads

In your example you are using the workers, and the issue that is occurring is that the work is not balanced of the workers.

According to the documentation:

The default synchronous workers assume that your application is resource-bound in terms of CPU and network bandwidth.

And:

Gunicorn relies on the operating system to provide all of the load balancing when handling requests.

if an application is CPU bound ("resource bound") than workers are ideal, if an application is I/O bound threads or pseudo threads are ideal.

extreme4all · 2022-12-09T13:49:25Z

is there any reason why async is not an option?

dapollak · 2022-12-09T13:53:40Z

@extreme4all because we're using not asyncio compatible library (a snowflake driver of sqlalchemy

jgould22 · 2022-12-09T16:31:56Z

In addition to what @extreme4all posted

I investigated uvicorn workers and found out they use asyncio server interface, which accepts new connections on python's event loop

This is not always true, Unicorn as the name implies will use uvloop depending on how it is configured / installed

Another thing to consider is that FastAPI/Starlette are spawning those def endpoints onto a threadpool, depending on its implementation that could be where you are seeing a bottle neck.

Pythons threads/process story can be difficult to reason about.

In the short term depending on the machine you have deployed this on Gunicorn will recommend 2 workers per core. However if the cores are not busy you can probably up the worker count and that may solve your problem / give you some breathing room while you investigate further.

dapollak · 2022-12-11T08:06:00Z

@jgould22 I know that def endpoints are spawned with threadpool threads, that's why I suspect we experience low latencies due to GIL limitations with many threads on the same worker (the amount of threads is not limited on each worker)

iudeen · 2022-12-11T11:23:30Z

The best approach for a scalable deployment (in containerized environments) is using barebones Uvicorn with single worker and scale your containers horizontally.

You can have health check APIs to ensure your containers don't go stale.

talboren · 2022-12-11T12:14:33Z

The best approach for a scalable deployment (in containerized environments) is using barebones Uvicorn with single worker and scale your containers horizontally.

You can have health check APIs to ensure your containers don't go stale.

@iudeen I'd love hearing where did you find this to be the best approach

iudeen · 2022-12-11T15:15:55Z

@talboren based on hands-on experience on a product that has hundreds of thousands concurrent users and in a way from FastAPI docs as well: https://fastapi.tiangolo.com/deployment/docker/#replication-number-of-processes

Kludex · 2022-12-11T18:52:22Z

there is no limit on the number of new connections a worker gets

This is not true.

The question was already answered on the first comment:

"Gunicorn relies on the operating system to provide all of the load balancing when handling requests."

This is on gunicorn.

Ref.: https://docs.gunicorn.org/en/latest/design.html?highlight=operating%20system#how-many-workers

dapollak · 2022-12-11T19:25:21Z

@Kludex The uvicorn workers don't limit the threads that the fastapi app creates, when using def endpoints. As I stated at the beginning of the thread, when a new connection is accepted, it's passed to the fastapi app which creates a new thread using AnyIO library, and then the event loop of the uvicorn worker gets available right away to accept new connections.

Kludex · 2022-12-11T20:05:20Z

Where does the uvicorn worker limits the threads?

Also, I didn't get the point of the previous message. Can you rephrase it?

dapollak · 2022-12-11T20:21:42Z

Where does the uvicorn worker limits the threads?

Also, I didn't get the point of the previous message. Can you rephrase it?

It doesn't limit, forgot a word 😅
The point is that the combination of unbalanced load on the worker and "unlimited" threads per is bad due to both GIL problems and not using all workers

Kludex · 2022-12-11T20:32:09Z

Uvicorn doesn't limit, but AnyIO does. There's a limit of 40 threads by default.

...but the above is not relevant to this discussion.

Gunicorn is in charge of the load balancing, as mentioned above - and this is the relevant part here.

dapollak · 2022-12-11T20:54:16Z

But gunicorn doesn't do it. It's documented -

Gunicorn relies on the operating system to provide all of the load balancing when handling requests.

So, with def endpoints, threads creation allows the same worker to get all the connection, so I suspect that this is what causes to the the workers being unbalanced

jgould22 · 2022-12-11T21:00:41Z

It is still in charge of load balancing among its workers.

It is just that the implementation chooses to delegate that to the OS (if the docs are correct)

@Kludex If AnyIO sets a default limit on the thread pool is there a mechanism to adjust that? Or does that AnyIO limit effectively cap FastAPI/Starlette at handling 40 requests ( assuming threads are being spawned for nothing else I guess ) per FastAPI process / Application and then any following requests to AnyIO will queue up in the thread pool implementation ?

Kludex · 2022-12-11T21:08:02Z

If AnyIO sets a default limit on the thread pool is there a mechanism to adjust that?

@jgould22 Yes: encode/starlette#1724 (comment).

btw, thanks for helping on the FastAPI issues. 🙏

In any case, I've already shared my thoughts here: the problem, if there's a problem, it's not something FastAPI can do something about. The description of the issue shows a problem on the load balancing, and the tool in charge of that is gunicorn.

dapollak · 2022-12-12T07:55:36Z

@Kludex Tried already to use this method to limit the threads number - that didn't help to spread the requests on other uvicorn workers.
I added this to my example -

@app.on_event("startup")
async def startup_event():
    limiter = anyio.to_thread.current_default_thread_limiter()
    limiter.total_tokens = 2

The first two requests arrived to the same worker, and the third wasn't handled since it arrived also to the first worker, but then got stuck until one of the first requests finished.

That's also the behavior when running directly uvicorn with 4 workers.

jgould22 · 2022-12-13T22:25:50Z

@Kludex Tried already to use this method to limit the threads number - that didn't help to spread the requests on other uvicorn workers. I added this to my example -
@app.on_event("startup")
async def startup_event():
    limiter = anyio.to_thread.current_default_thread_limiter()
    limiter.total_tokens = 2
The first two requests arrived to the same worker, and the third wasn't handled since it arrived also to the first worker, but then got stuck until one of the first requests finished.

That's also the behavior when running directly uvicorn with 4 workers.

Adjusting the number of workers etc is not going to have an affect on this.

The way Gunicorn works is by opening a socket and then loading the application. Then it forks the process which produces one of its workers. To start workers it forks again until it has the desired number. This produces a set of processes that are all blocking on accept() on the same socket. When a request comes in the linux scheduler decides which process gets to wake up and serve the request. How it decides that it sorta outside the scope of FastAPI.

raphaelauv · 2023-02-26T23:42:17Z

-> #3091 (comment)

dapollak added the question Question or problem label Dec 8, 2022

tiangolo added the question-migrate label Feb 28, 2023

Repository owner locked and limited conversation to collaborators Feb 28, 2023

tiangolo converted this issue into discussion #8433 Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Scale FastApi with sync endpoints #5759

Scale FastApi with sync endpoints #5759

dapollak commented Dec 8, 2022

extreme4all commented Dec 9, 2022

extreme4all commented Dec 9, 2022

dapollak commented Dec 9, 2022

jgould22 commented Dec 9, 2022

dapollak commented Dec 11, 2022

iudeen commented Dec 11, 2022

talboren commented Dec 11, 2022

iudeen commented Dec 11, 2022 •

edited

Kludex commented Dec 11, 2022

dapollak commented Dec 11, 2022 •

edited

Kludex commented Dec 11, 2022

dapollak commented Dec 11, 2022 •

edited

Kludex commented Dec 11, 2022

dapollak commented Dec 11, 2022

jgould22 commented Dec 11, 2022

Kludex commented Dec 11, 2022

dapollak commented Dec 12, 2022 •

edited

jgould22 commented Dec 13, 2022

raphaelauv commented Feb 26, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Scale FastApi with sync endpoints #5759

Scale FastApi with sync endpoints #5759

Comments

dapollak commented Dec 8, 2022

First Check

Commit to Help

Example Code

Description

Operating System

Operating System Details

FastAPI Version

Python Version

Additional Context

extreme4all commented Dec 9, 2022

extreme4all commented Dec 9, 2022

dapollak commented Dec 9, 2022

jgould22 commented Dec 9, 2022

dapollak commented Dec 11, 2022

iudeen commented Dec 11, 2022

talboren commented Dec 11, 2022

iudeen commented Dec 11, 2022 • edited

Kludex commented Dec 11, 2022

dapollak commented Dec 11, 2022 • edited

Kludex commented Dec 11, 2022

dapollak commented Dec 11, 2022 • edited

Kludex commented Dec 11, 2022

dapollak commented Dec 11, 2022

jgould22 commented Dec 11, 2022

Kludex commented Dec 11, 2022

dapollak commented Dec 12, 2022 • edited

jgould22 commented Dec 13, 2022

raphaelauv commented Feb 26, 2023

This issue was moved to a discussion.

iudeen commented Dec 11, 2022 •

edited

dapollak commented Dec 11, 2022 •

edited

dapollak commented Dec 11, 2022 •

edited

dapollak commented Dec 12, 2022 •

edited