Skip to content

Scale FastApi with sync endpoints #5759

@dapollak

Description

@dapollak

First Check

  • I added a very descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the FastAPI documentation, with the integrated search.
  • I already searched in Google "How to X in FastAPI" and didn't find any information.
  • I already read and followed all the tutorial in the docs and didn't find an answer.
  • I already checked if it is not related to FastAPI but to Pydantic.
  • I already checked if it is not related to FastAPI but to Swagger UI.
  • I already checked if it is not related to FastAPI but to ReDoc.

Commit to Help

  • I commit to help with one of those options 👆

Example Code

import logging
from fastapi import FastAPI

app = FastAPI()
logger = logging.getLogger()

@app.get("/")
def root():
    logger.info(f"Running on {os.getpid()}")
    time.sleep(3600)
    return {"message": "Hello World"}

Description

I've noticed lately that we have some latency problems when the servers are busy. I dived into it and found out that, if for example, I have four uvicorn workers, while one worker is very busy, the rest three are significantly less busier. That has two problems -

  1. We're not taking advantage of all our parallelism power
  2. We're suffering more from Python's GIL problems on the same worker.

In the example code, if I run it with gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000, the first five requests will result in the following output -

INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 642

I investigated uvicorn workers and found out they use asyncio server interface, which accepts new connections on python's event loop, which means that as long we're using not-async endpoints (which run on AnyIo threads), there is no limit on the number of new connections a worker gets, which again takes us back to the GIL problem.

To sum it all up, I'm a little confused about what is my best option for using FastApi with not-async endpoints and get better performance. Should I use more workers? Should I use more container instances?
Maybe we should implement a new worker which able to control better multithreading on such cases ? I have some implementation ideas for this purpose, but I’d like to have your advice here.

Operating System

Linux

Operating System Details

FastAPI Version

0.88.0

Python Version

3.9.4

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions