Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gunicorn Workers Hangs And Consumes Memory Forever #596

Closed
MeteHanC opened this issue Oct 7, 2019 · 63 comments
Closed

Gunicorn Workers Hangs And Consumes Memory Forever #596

MeteHanC opened this issue Oct 7, 2019 · 63 comments

Comments

@MeteHanC
Copy link

MeteHanC commented Oct 7, 2019

Describe the bug
I have deployed FastAPI which queries the database and returns the results. I made sure closing the DB connection and all. I'm running gunicorn with this line ;
gunicorn -w 8 -k uvicorn.workers.UvicornH11Worker -b 0.0.0.0 app:app --timeout 10
So after exposing it to the web, I run a load test which makes 30-40 requests in parallel to the fastapi. And the problem starts here. I'm watching the 'HTOP' in the mean time and I see that RAM usage is always growing, seems like no task is killed after completing it's job. Then I checked the Task numbers, same goes for it too, seems like gunicorn workers do not get killed. After some time RAM usage gets at it's maximum, and starts to throw errors. So I killed the gunicorn app but the thing is processes spawned by main gunicorn proces did not get killed and still using all the memory.

Environment:

  • OS: Ubuntu 18.04

  • FastAPI Version : 0.38.1

  • Python version : 3.7.4

@MeteHanC MeteHanC added the bug Something isn't working label Oct 7, 2019
@wanaryytel
Copy link

You're using the PyPy compliant uvicorn worker class - is your system based on PyPy? If you're running on cpython then I suggest you try out the cpython implementation uvicorn.workers.UvicornWorker.

But in other news, I'm seeing something similar. I just run uvicorn with this:
uvicorn --host 0.0.0.0 --port 7001 app:api --reload but in some cases the memory is never freed up.

For example this function:

@api.post("/randompath")
def get_xoxo(file: UploadFile = File(...)):
    k = []
    for i in range(10):
        k.append('gab' * 9999999)

When I hit the endpoint once, the memory is cleared up, but when I hit it 10x, some of the memory is left allocated and when I hit it another 10x, even more memory is left allocated. This continues until I run out of memory or restart the process.
If I change the get_xoxo function to be async, then the memory is always cleared up, but the function also blocks much more (which makes sense since I'm not taking advantage of any awaits in there).

So - is there a memory leak? I'm not sure, but something is handled incorrectly.

My system is running on python:3.7 Docker container. Basically the same problem occurs in production where uvicorn is run with uvicorn --host 0.0.0.0 --port %(ENV_UVICORN_PORT)s --workers %(ENV_UVICORN_WORKERS)s --timeout-keep-alive %(ENV_UVICORN_KEEPELIVE)s --log-level %(ENV_UVICORN_LOGLEVEL)s app:api.

@dmontagu
Copy link
Collaborator

@wanaryytel

This is probably an issue with starlette's run_in_threadpool, or maybe even the python ThreadPoolExecutor. If you port that endpoint to starlette, I expect you'll get the same behavior.

Recently the starlette and uvicorn teams have been pretty good about addressing issues; if you can reproduce the memory leak in starlette, I'd recommend creating an issue demonstrating it in the starlette (and possible uvicorn?) repos.

@MeteHanC
Copy link
Author

You're using the PyPy compliant uvicorn worker class - is your system based on PyPy? If you're running on cpython then I suggest you try out the cpython implementation uvicorn.workers.UvicornWorker.

I have noticed that too, under high load memory is left allocated but for single requests memory gets cleared up. And I already tried it making async but it is not deallocating the memory as well.

@wanaryytel

This is probably an issue with starlette's run_in_threadpool, or maybe even the python ThreadPoolExecutor. If you port that endpoint to starlette, I expect you'll get the same behavior.

Recently the starlette and uvicorn teams have been pretty good about addressing issues; if you can reproduce the memory leak in starlette, I'd recommend creating an issue demonstrating it in the starlette (and possible uvicorn?) repos.

Hmm, reproducing it in Starlette makes sense. I will reproduce the issue and open an issue on Starlette repo. Thanks for the idea

@euri10
Copy link
Contributor

euri10 commented Oct 15, 2019

@wanaryytel out of curiosity, why can't you use async with an await.sleep(0) thrown in there?

@wanaryytel
Copy link

@euri10 I could but I fail to see the benefit of that. It would still be blocking which I'm trying to avoid?

@euri10
Copy link
Contributor

euri10 commented Oct 16, 2019

If I change the get_xoxo function to be async, then the memory is always cleared up, but the function also blocks much more (which makes sense since I'm not taking advantage of any awaits in there).

reading your comment @wanaryytel I thought the only thing preventing you from transforming your def in async def was just a missing await which I suggested could be that one await asyncio.sleep(0) 👍

But if there are other parts in your def get_xoxo that block then indeed there's not much you'll benefit from it, you can nonetheless try to run the code in a loop.run_in_executor but I'm not sure if this plays well with the current loop

@wanaryytel
Copy link

@euri10 AFAIK you can have an async function without any awaits, but there's no point, because it will still be a semi-regular blocking function. Await is the thing that enables async processing, without that it's just a regular function basically. Correct me if I'm wrong

@dmontagu
Copy link
Collaborator

dmontagu commented Oct 16, 2019

@wanaryytel It looks like this might actually be related to how python manages memory -- it's not guaranteed to release memory back to the os.

The top two answers to this stack overflow question have a lot of good info on this topic, and might point in the right direction.

That said, given you are just executing the same call over and over, it's not clear to me why it wouldn't reuse the memory -- there could be something leaking here (possibly related to the ThreadPoolExecutor...). You could check if it was related to the ThreadPoolExecutor by checking if you got the same behavior with an async def endpoint, which would not run in the ThreadPoolExecutor.

If the requests were being made concurrently, I suppose that could explain the use of more memory, which would then feed into the above stack overflow answer's explanation. But if you just kept calling the endpoint one call at a time in sequence, I think it's harder to explain.

If you really wanted to dig into this, it might be worth looking at the gc module and seeing if manually calling the garbage collector helps at all.

@MeteHanC whether this explains/addresses your issue or not definitely depends on what your endpoints are doing.

@madkote
Copy link

madkote commented Dec 4, 2019

hi all,
I have noticed the same issue with fastapi.
the function is async def - inside I do load a resource ~200mb, do something with this and return the response. the memory is not given free.

Example:

import gc
@router_dev.post(...)
async def endpoint(...):
  model = Model(....)  # load model from file
  results = []
  try:
       ... # do something with model
       ... # alternative - also do something with the model in thread pool
       ... # doing something with the model - is some computation and each step increases memory for about ~1mb (this is expected and gets free once done
      ... # above is tested in the library (normal function, exact same way) and there are no mem leaks - memory gets free as expected
     results.append(...)  # append here a string
  finally:
     # model = None # this is also not working well
     del model
     gc.collect()
  return dict(results=results)

This occurs with:

  • gunicron uvicorn.workers.UvicornWorker
  • uvicorn
  • hypecorn
  • this is a simple one requests after another... and the memory keeps growing

So to me it seems to be a bug in starlette or uvicorn...

@euri10
Copy link
Contributor

euri10 commented Dec 4, 2019

would be happy to take a look a it @madkote if you get a simple reproductible example

@dmontagu
Copy link
Collaborator

dmontagu commented Dec 4, 2019

@madkote I’m not 100% confident this explains the issue, but I think this may actually just be how python works. I think this article does a good job explaining it:

You’ll notice that I’ve been saying “free” in quotes quite a bit. The reason is that when a block is deemed “free”, that memory is not actually freed back to the operating system. The Python process keeps it allocated and will use it later for new data. Truly freeing memory returns it to the operating system to use.

Edit: just realized I already replied in this issue with a link to similar discussions 😄.

@dmontagu
Copy link
Collaborator

dmontagu commented Dec 4, 2019

@madkote this starlette issue includes a good discussion of the problem (and some patterns to avoid that might be exacerbating it): encode/starlette#667

@madkote
Copy link

madkote commented Dec 4, 2019

@dmontagu @euri10 I still have a suspect on two aspects:

  • my model library - I will do even more testing and profiling
  • I found out that blocking code performs better in terms of memory. With async every request increase by ~5MB (some model operation, and data processing). But with blocking manner it is about ~1mb every 2-3 requests..

everywhere I use the same model, even I tried with many (custom and free available)... the result is always the same.

So, please give me some more time to make a reasonable example to reproduce the issue.

@madkote
Copy link

madkote commented Dec 4, 2019

just for info: I run the same function under flask -> and the memory is constant (!) -> so for me there is something wrong with async...

@euri10
Copy link
Contributor

euri10 commented Dec 4, 2019 via email

@madkote
Copy link

madkote commented Dec 5, 2019

@euri10 so I have tried to eat memory and do some computations in a functions, called by the endpoint (async || executor || normally) -> seems to b alright.

I guess my use case is a bit more specific (using ctypes objects, numpy and lot of cpu). As I mentioned above, I am still profiling the custom lib once again - in the first experiments there were no leaks... if now also none -> then I will try to mock the whole chain... or I will need to do everything cpp, to avoid the mess with ctypes.

@dmontagu
Copy link
Collaborator

dmontagu commented Dec 6, 2019

Wouldn't surprise me if it was the combination of async and ctypes together causing issues.

I'm also interested if you can get a reproducible snippet demonstrating the issue.

@madkote
Copy link

madkote commented Dec 11, 2019

@euri10 @dmontagu thanks for comments and hints.

I have spent ~3days looking on the custom library with ctypes code - actually it is a mix of huge 3rd party C++ (community) and mine ctypes. Difficult to say, where the issue is. I do suspect ctypes and async, since the issue occurs only when using the library in async methods.
I have tested my C++ code also on simple web service in C - and was not able to reproduce mem leaks with >100 concurrent users...

So for me this is NOT the gunicorn and NOT fastapi issue. (actually, similar tests were done also with aiohttp).

Many Thanks for support!

@dmontagu
Copy link
Collaborator

@madkote sorry to hear that you haven't been able to find the source of the issue, even if it isn't an issue with Gunicorn/Uvicorn/FastAPI.

If you are able to simply reproduce the issue in code you are comfortable sharing, I think it would be worth a post to bugs.python.org.

@madkote
Copy link

madkote commented Dec 12, 2019

@dmontagu hard to reprduce with 3rd party C++ code (and it is a monster). So I decided to have a small and brutal fix, while during xmas time I can provide a new implementation of the lib. If another way of implemention will hit same issue, then it makes sense to spend time on reproducing.

For now, lesson learned (IMHO) - avoid ctypes and async. I am also not sure who is owning memory in case with ctypes and how to control it. Example, pass a python list to ctypes object (e.g. for processing) -> who is responsible for this list? - gc, ctypes object...

bug in python would make sense, only I have full control what is 3rd party code does - and this is not simple, unfortunately.

As I said above, I have no issues with guni/uvi/fastapi - the issue is only on mine llb

@dmontagu
Copy link
Collaborator

dmontagu commented Dec 13, 2019

Yeah, I've never had good experience with ctypes -- I now go straight to Cython/PyBind11 for similar applications, and just deal with the slightly increased amount of boilerplate while writing bindings.

@FLming
Copy link

FLming commented Dec 26, 2019

I encountered the same problem, a endpoint defined with def and hit it again and again, memory leak will happen. but defined with async, it will not. Looking forward to the outcome of the discussion.

@dmontagu
Copy link
Collaborator

@FLming Are you doing anything special with ctypes or other similar low level functionality?

Do you have a reproducible example we can play with to try to fix the issue? That would help a lot.

@FLming
Copy link

FLming commented Jan 4, 2020

@dmontagu I found the one of the reason, that is detectAndCompute. After investigation, I find this function in def instead of async def will cause memory leak.
BTW, I find only the AKAZE detector will happen memory leak. code just like:

detector = cv.AKAZE_create()
kpts, desc = detector.detectAndCompute(gray, mask)

maybe this bug is belong to opencv.

A small example is as follows:

import cv2 as cv
from fastapi import FastAPI

app = FastAPI()

@app.get("/test")
def feature_detect():
    img = cv.imread("cap.jpg", 0)
    detector = cv.AKAZE_create()
    kpts, desc = detector.detectAndCompute(img, None)
    return None

and I use ab to send requests.

@dmontagu
Copy link
Collaborator

dmontagu commented Jan 4, 2020

Yeah, hard to say, it wouldn't surprise me at all if OpenCV didn't play perfectly nice with threadpool calls.

You might try running the function in a subprocess or processpool (instead of a threadpool as fastapi does automatically for def functions) -- that should ensure memory gets cleaned up properly (and will do a better job of not blocking the server to boot, since I assume the detectAndCompute call is going to be compute-bound).

@madkote
Copy link

madkote commented Jan 7, 2020

@FLming if possible, create detectors (or similar outside of endpoints).
@dmontagu running in a process does not always help, because not everything is pickable (or very hard) - depending on details. outsourcing to a process helps, but is not ideal - either you have to spin new process per request or have manage queue for a worker process - both has pros and cons.

PS: Happy New Year!

@dmontagu
Copy link
Collaborator

dmontagu commented Jan 8, 2020

Yeah, in this case I suggested it specifically because it looked like the function involved would be compatible with a subprocess call.

There is a ProcessPool interface similar to ThreadPool that you can use very similarly to how run_in_threadpool works in starlette. I think that mostly solves the problems related to managing the worker processes. But yes, the arguments/return types need to be pickleable, so there are many cases where you wouldn't want to take this approach.

@tiangolo
Copy link
Member

Thanks for all the discussion here everyone! Thanks for the help @euri10 , @dmontagu for all the analysis, thanks @madkote for reporting back after your investigation. 🍰

If anyone still has an issue, the best way to check it would be with a small self-contained example that replicates it.

@MeteHanC, in the end, were you able to solve your problem?

@delijati
Copy link

This comment #596 (comment)

An tensorflow example. Without setting max_workers=1 it grows up to 750MB without it stays around 370MB. (Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz 4 x CPU)

import asyncio
import tensorflow as tf
import os
import gc
import psutil

from concurrent import futures

# you can change worker number here
executor = futures.ThreadPoolExecutor(max_workers=1)

tf_array = tf.zeros((1, 1024))
input = tf.keras.Input((1, 1024))
dense1 = tf.keras.layers.Dense(1024)(input)
dense2 = tf.keras.layers.Dense(1024)(dense1)
dense2 = tf.keras.layers.BatchNormalization()(dense2)
dense2 = tf.keras.layers.LeakyReLU()(dense2)
output = tf.keras.layers.Dense(1)(dense2)
model = tf.keras.Model(inputs=[input], outputs=[output])

export_path = "temp_export.h5"

model.save(export_path)


print(tf.__version__)
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)


def mm():
    model = tf.keras.models.load_model(export_path)
    del model
    gc.collect()
    tf.keras.backend.clear_session()


async def main():
    for i in range(1000):
        if i % 10 == 0:
            process = psutil.Process(os.getpid())
            print("used ", process.memory_info().rss / (1024.0 ** 2), "Mb")
        loop = asyncio.get_event_loop()
        # XXX use this and we have no "memory leak"x"
        await loop.run_in_executor(executor, mm)
        # await loop.run_in_executor(None, mm)


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

@ty2009137128
Copy link

I also found this problem, when i use gunicorn+flask ,memory would increase fastly ,and my application on k8s paltform can handle 1000000 requests, how to solve this problem?

@myufa
Copy link

myufa commented Feb 1, 2021

Any fixes to this? Got the same error when running on google app engine with gunicorn and uvloop. Are there any alternatives that work?

@HsengivS
Copy link

Hi all

In My flask restful web application I tried setting the max_requests option in gunicorn config file as 500 requests. so after every 500 requests the worker will reboot, this helped me reducing some amount of memory but still I face the increasing memory issue

@vdwees
Copy link

vdwees commented Mar 17, 2021

@ty2009137128, @HsengivS : seems like you are working with flask, which is a different project from fastapi. I do hope you can solve your memory leak bugs, but they are certainly unrelated to the issue reported here. Good luck!

@vdwees
Copy link

vdwees commented Mar 17, 2021

@tiangolo would it be appropriate to close this issue with a wont-fix? It seems like the memory leak bug arises from uvloop on python versions older than python3.8, so it is unlikely to be fixed. Regardless, it is external to fastapi.

Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate.

@Rokfordchez
Copy link

still got this problem, memory does not get deallocated, my conditions:

python 3.8.9

fastapi==0.63.0
gunicorn==20.0.4
uvicorn==0.11.8`

["gunicorn", "-b", "0.0.0.0:8080", "-w", "3",'-k', 'uvicorn.workers.UvicornWorker', "palette.main:app", '--timeout', '0', "--graceful-timeout", "5", '--access-logfile', '-', '--error-logfile', '-', '--log-level', 'error']

events

def create_start_app_handler(app: FastAPI) -> Callable:  # type: ignore
    async def start_app() -> None:
        app.state.executor = ProcessPoolExecutor(max_workers=max(cpu_count()-1, 1))

    return start_app


def create_stop_app_handler(app: FastAPI) -> Callable:  # type: ignore
    @logger.catch
    async def stop_app() -> None:
        app.state.executor.shutdown()

    return stop_app

executor

async def run_fn_in_executor(request: Request, fn, *args):
    loop = asyncio.get_running_loop()
    return await loop.run_in_executor(request.app.state.executor, fn, *args)

routes

@router.post("/image", status_code=200, response_model=ImageResponse)
async def extract_color_palette(request: Request, image: ImageRequest):
    file = await fetch_content(image.url)
    colors = await run_fn_in_executor(request, process_image_to_palette, file)
    return ImageResponse(url=image.url, colors=colors)

@daggy1234
Copy link

I'm also having ghastly memory issues.
image

my code is fully open source, so feel free to peek: https://github.com/daggy1234/dagpi-image . I'm using gunicorn in a docker container with uvicorn. There are no sync functions, its async with multiprocessing

@Trinkes
Copy link

Trinkes commented Aug 27, 2021

@tiangolo would it be appropriate to close this issue with a wont-fix? It seems like the memory leak bug arises from uvloop on python versions older than python3.8, so it is unlikely to be fixed. Regardless, it is external to fastapi.

Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate.

I'm using python 3.9 and still have the issue. The memory consumption is always >90%.

@vdwees
Copy link

vdwees commented Aug 27, 2021

Are you using uvicorn? If so, do you still have the issue if you disable uvloop?

@vollcheck
Copy link

@vd

Are you using uvicorn? If so, do you still have the issue if you disable uvloop?

Yeah, using other option e.g. loop="asyncio" (documentation) results in the same memory leakage.

@Trinkes
Copy link

Trinkes commented Sep 8, 2021

@tiangolo would it be appropriate to close this issue with a wont-fix? It seems like the memory leak bug arises from uvloop on python versions older than python3.8, so it is unlikely to be fixed. Regardless, it is external to fastapi.
Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate.

I'm using python 3.9 and still have the issue. The memory consumption is always >90%.

After further investigation, in my case, the memory consumption was normal. For testing, I added more RAM and saw the memory consumption pretty stable.
I ended up migrating to uvicorn and lowering the number of workers from 4 to 3, in my case was enough.

Thanks for the help!

@boy-be-ambitious
Copy link

boy-be-ambitious commented Dec 6, 2021

Hi all

In My flask restful web application I tried setting the max_requests option in gunicorn config file as 500 requests. so after every 500 requests the worker will reboot, this helped me reducing some amount of memory but still I face the increasing memory issue

hi @HsengivS , you can add threaded=False for Flask like app.run(host='0.0.0.0', port=9333, threaded= False). I used it successfully to avoid Flask memory leaking issue.

@yusufcakmakk
Copy link

I have solved this issue with following settings:

  • python=3.8.9
  • fastapi=0.63.0
  • uvicorn=0.17.6
  • uvloop=0.16.0

@nikhilkharade
Copy link

@yusufcakmakk can you share your entire requirements.txt ? I am still facing the same issue after going on the versions u mentioned.

@yusufcakmakk
Copy link

@yusufcakmakk can you share your entire requirements.txt ? I am still facing the same issue after going on the versions u mentioned.

Here is my requirements file:

numpy~=1.22.2
scikit-learn==1.0.2
pandas~=1.1.5
fastapi~=0.63.0
pydantic~=1.7.3
loguru~=0.5.3
uvicorn~=0.17.6
click~=7.1.2
uvloop==0.16.0
async-exit-stack~=1.0.1
async-generator~=1.10
httptools~=0.1.1
SQLAlchemy~=1.3.22
python-multipart~=0.0.5
xlrd~=2.0.1
openpyxl~=3.0.7
requests~=2.25.1
psutil
scipy==1.8.0

@joshlincoln
Copy link

joshlincoln commented Jun 2, 2022

I'm seeing same issue with

fastapi==0.74.1
uvicorn[standard]==0.17.5
isort==5.10.1
black==22.3.0
flake8==4.0.1
pandas==1.4.1
numpy==1.22.2
pymysql==1.0.2
pytest==7.0.1
requests==2.27.1
coverage==6.3.2
pytest-cov==3.0.0
python-configuration==0.8.2
google-cloud-secret-manager==2.9.2
SQLAlchemy==1.4.32
cryptography==36.0.2
tenacity==8.0.0
httpx==0.22.0
pytest-asyncio==0.18.3
ddtrace==0.60.1
alembic==1.7.7
sqlalchemy[asyncio]==1.4.32
aiomysql==0.1.0
pytest-env==0.6.2
pytest-mock==3.7.0
datadog>=0.42.0
pyhumps==3.5.3
dave-metrics>=0.5.1
hypercorn[uvloop]==0.13.2
gunicorn==20.1.0

Running python:3.10.1-slim-buster in k8s with gunicorn -k uvicorn.workers.UvicornWorker app.api.server:app --workers 1 --bind 0.0.0.0:8000

@tiangolo tiangolo added question Question or problem reviewed and removed bug Something isn't working labels Feb 23, 2023
@tiangolo tiangolo changed the title [BUG] Gunicorn Workers Hangs And Consumes Memory Forever Gunicorn Workers Hangs And Consumes Memory Forever Feb 24, 2023
@fastapi fastapi locked and limited conversation to collaborators Feb 28, 2023
@tiangolo tiangolo converted this issue into discussion #9145 Feb 28, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests