-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gunicorn Workers Hangs And Consumes Memory Forever #596
Comments
You're using the PyPy compliant uvicorn worker class - is your system based on PyPy? If you're running on cpython then I suggest you try out the cpython implementation But in other news, I'm seeing something similar. I just run uvicorn with this: For example this function:
When I hit the endpoint once, the memory is cleared up, but when I hit it 10x, some of the memory is left allocated and when I hit it another 10x, even more memory is left allocated. This continues until I run out of memory or restart the process. So - is there a memory leak? I'm not sure, but something is handled incorrectly. My system is running on |
This is probably an issue with starlette's Recently the starlette and uvicorn teams have been pretty good about addressing issues; if you can reproduce the memory leak in starlette, I'd recommend creating an issue demonstrating it in the starlette (and possible uvicorn?) repos. |
I have noticed that too, under high load memory is left allocated but for single requests memory gets cleared up. And I already tried it making async but it is not deallocating the memory as well.
Hmm, reproducing it in Starlette makes sense. I will reproduce the issue and open an issue on Starlette repo. Thanks for the idea |
@wanaryytel out of curiosity, why can't you use async with an await.sleep(0) thrown in there? |
@euri10 I could but I fail to see the benefit of that. It would still be blocking which I'm trying to avoid? |
reading your comment @wanaryytel I thought the only thing preventing you from transforming your But if there are other parts in your |
@euri10 AFAIK you can have an async function without any awaits, but there's no point, because it will still be a semi-regular blocking function. Await is the thing that enables async processing, without that it's just a regular function basically. Correct me if I'm wrong |
@wanaryytel It looks like this might actually be related to how python manages memory -- it's not guaranteed to release memory back to the os. The top two answers to this stack overflow question have a lot of good info on this topic, and might point in the right direction. That said, given you are just executing the same call over and over, it's not clear to me why it wouldn't reuse the memory -- there could be something leaking here (possibly related to the ThreadPoolExecutor...). You could check if it was related to the ThreadPoolExecutor by checking if you got the same behavior with an If the requests were being made concurrently, I suppose that could explain the use of more memory, which would then feed into the above stack overflow answer's explanation. But if you just kept calling the endpoint one call at a time in sequence, I think it's harder to explain. If you really wanted to dig into this, it might be worth looking at the @MeteHanC whether this explains/addresses your issue or not definitely depends on what your endpoints are doing. |
hi all, Example:
This occurs with:
So to me it seems to be a bug in starlette or uvicorn... |
would be happy to take a look a it @madkote if you get a simple reproductible example |
@madkote I’m not 100% confident this explains the issue, but I think this may actually just be how python works. I think this article does a good job explaining it:
Edit: just realized I already replied in this issue with a link to similar discussions 😄. |
@madkote this starlette issue includes a good discussion of the problem (and some patterns to avoid that might be exacerbating it): encode/starlette#667 |
@dmontagu @euri10 I still have a suspect on two aspects:
everywhere I use the same model, even I tried with many (custom and free available)... the result is always the same. So, please give me some more time to make a reasonable example to reproduce the issue. |
just for info: I run the same function under flask -> and the memory is constant (!) -> so for me there is something wrong with async... |
that's interesting, please send a little snippet, not saying I will find
the root of the problem but it's a good start !
…On Wed, Dec 4, 2019 at 5:06 PM RES ***@***.***> wrote:
just for info: I run the same function under flask -> and the memory is
constant (!) -> so for me there is something wrong with async...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#596>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAINSPRT5AKPRENJPNBNHZTQW7IR3ANCNFSM4I6DQN5A>
.
--
benoit barthelet
http://pgp.mit.edu/pks/lookup?op=get&search=0xF150E01A72F6D2EE
|
@euri10 so I have tried to eat memory and do some computations in a functions, called by the endpoint (async || executor || normally) -> seems to b alright. I guess my use case is a bit more specific (using ctypes objects, numpy and lot of cpu). As I mentioned above, I am still profiling the custom lib once again - in the first experiments there were no leaks... if now also none -> then I will try to mock the whole chain... or I will need to do everything cpp, to avoid the mess with ctypes. |
Wouldn't surprise me if it was the combination of async and ctypes together causing issues. I'm also interested if you can get a reproducible snippet demonstrating the issue. |
@euri10 @dmontagu thanks for comments and hints. I have spent ~3days looking on the custom library with ctypes code - actually it is a mix of huge 3rd party C++ (community) and mine ctypes. Difficult to say, where the issue is. I do suspect ctypes and async, since the issue occurs only when using the library in async methods. So for me this is NOT the gunicorn and NOT fastapi issue. (actually, similar tests were done also with aiohttp). Many Thanks for support! |
@madkote sorry to hear that you haven't been able to find the source of the issue, even if it isn't an issue with Gunicorn/Uvicorn/FastAPI. If you are able to simply reproduce the issue in code you are comfortable sharing, I think it would be worth a post to bugs.python.org. |
@dmontagu hard to reprduce with 3rd party C++ code (and it is a monster). So I decided to have a small and brutal fix, while during xmas time I can provide a new implementation of the lib. If another way of implemention will hit same issue, then it makes sense to spend time on reproducing. For now, lesson learned (IMHO) - avoid ctypes and async. I am also not sure who is owning memory in case with ctypes and how to control it. Example, pass a python list to ctypes object (e.g. for processing) -> who is responsible for this list? - gc, ctypes object... bug in python would make sense, only I have full control what is 3rd party code does - and this is not simple, unfortunately. As I said above, I have no issues with guni/uvi/fastapi - the issue is only on mine llb |
Yeah, I've never had good experience with ctypes -- I now go straight to Cython/PyBind11 for similar applications, and just deal with the slightly increased amount of boilerplate while writing bindings. |
I encountered the same problem, a endpoint defined with |
@FLming Are you doing anything special with ctypes or other similar low level functionality? Do you have a reproducible example we can play with to try to fix the issue? That would help a lot. |
@dmontagu I found the one of the reason, that is detectAndCompute. After investigation, I find this function in detector = cv.AKAZE_create()
kpts, desc = detector.detectAndCompute(gray, mask) maybe this bug is belong to opencv. A small example is as follows: import cv2 as cv
from fastapi import FastAPI
app = FastAPI()
@app.get("/test")
def feature_detect():
img = cv.imread("cap.jpg", 0)
detector = cv.AKAZE_create()
kpts, desc = detector.detectAndCompute(img, None)
return None and I use ab to send requests. |
Yeah, hard to say, it wouldn't surprise me at all if OpenCV didn't play perfectly nice with threadpool calls. You might try running the function in a subprocess or processpool (instead of a threadpool as fastapi does automatically for |
@FLming if possible, create detectors (or similar outside of endpoints). PS: Happy New Year! |
Yeah, in this case I suggested it specifically because it looked like the function involved would be compatible with a subprocess call. There is a ProcessPool interface similar to ThreadPool that you can use very similarly to how |
Thanks for all the discussion here everyone! Thanks for the help @euri10 , @dmontagu for all the analysis, thanks @madkote for reporting back after your investigation. 🍰 If anyone still has an issue, the best way to check it would be with a small self-contained example that replicates it. @MeteHanC, in the end, were you able to solve your problem? |
This comment #596 (comment) An tensorflow example. Without setting import asyncio
import tensorflow as tf
import os
import gc
import psutil
from concurrent import futures
# you can change worker number here
executor = futures.ThreadPoolExecutor(max_workers=1)
tf_array = tf.zeros((1, 1024))
input = tf.keras.Input((1, 1024))
dense1 = tf.keras.layers.Dense(1024)(input)
dense2 = tf.keras.layers.Dense(1024)(dense1)
dense2 = tf.keras.layers.BatchNormalization()(dense2)
dense2 = tf.keras.layers.LeakyReLU()(dense2)
output = tf.keras.layers.Dense(1)(dense2)
model = tf.keras.Model(inputs=[input], outputs=[output])
export_path = "temp_export.h5"
model.save(export_path)
print(tf.__version__)
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
def mm():
model = tf.keras.models.load_model(export_path)
del model
gc.collect()
tf.keras.backend.clear_session()
async def main():
for i in range(1000):
if i % 10 == 0:
process = psutil.Process(os.getpid())
print("used ", process.memory_info().rss / (1024.0 ** 2), "Mb")
loop = asyncio.get_event_loop()
# XXX use this and we have no "memory leak"x"
await loop.run_in_executor(executor, mm)
# await loop.run_in_executor(None, mm)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main()) |
I also found this problem, when i use gunicorn+flask ,memory would increase fastly ,and my application on k8s paltform can handle 1000000 requests, how to solve this problem? |
Any fixes to this? Got the same error when running on google app engine with gunicorn and uvloop. Are there any alternatives that work? |
Hi all In My flask restful web application I tried setting the max_requests option in gunicorn config file as 500 requests. so after every 500 requests the worker will reboot, this helped me reducing some amount of memory but still I face the increasing memory issue |
@ty2009137128, @HsengivS : seems like you are working with |
@tiangolo would it be appropriate to close this issue with a Anecdotally, I have encountered memory leaks when mixing uvloop and code with a lot of c extensions in python3.8+ outside of the context of a web service like fastapi/uvicorn/gunicorn. In light of this, perhaps an example of how to run fastapi without uvloop would be appropriate. |
still got this problem, memory does not get deallocated, my conditions:
def create_start_app_handler(app: FastAPI) -> Callable: # type: ignore
async def start_app() -> None:
app.state.executor = ProcessPoolExecutor(max_workers=max(cpu_count()-1, 1))
return start_app
def create_stop_app_handler(app: FastAPI) -> Callable: # type: ignore
@logger.catch
async def stop_app() -> None:
app.state.executor.shutdown()
return stop_app
async def run_fn_in_executor(request: Request, fn, *args):
loop = asyncio.get_running_loop()
return await loop.run_in_executor(request.app.state.executor, fn, *args)
@router.post("/image", status_code=200, response_model=ImageResponse)
async def extract_color_palette(request: Request, image: ImageRequest):
file = await fetch_content(image.url)
colors = await run_fn_in_executor(request, process_image_to_palette, file)
return ImageResponse(url=image.url, colors=colors) |
I'm also having ghastly memory issues. my code is fully open source, so feel free to peek: https://github.com/daggy1234/dagpi-image . I'm using gunicorn in a docker container with uvicorn. There are no sync functions, its async with multiprocessing |
I'm using |
Are you using uvicorn? If so, do you still have the issue if you disable uvloop? |
Yeah, using other option e.g. |
After further investigation, in my case, the memory consumption was normal. For testing, I added more RAM and saw the memory consumption pretty stable. Thanks for the help! |
hi @HsengivS , you can add threaded=False for Flask like |
I have solved this issue with following settings:
|
@yusufcakmakk can you share your entire requirements.txt ? I am still facing the same issue after going on the versions u mentioned. |
Here is my requirements file: numpy~=1.22.2 |
I'm seeing same issue with
Running |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Describe the bug
I have deployed FastAPI which queries the database and returns the results. I made sure closing the DB connection and all. I'm running gunicorn with this line ;
gunicorn -w 8 -k uvicorn.workers.UvicornH11Worker -b 0.0.0.0 app:app --timeout 10
So after exposing it to the web, I run a load test which makes 30-40 requests in parallel to the fastapi. And the problem starts here. I'm watching the 'HTOP' in the mean time and I see that RAM usage is always growing, seems like no task is killed after completing it's job. Then I checked the Task numbers, same goes for it too, seems like gunicorn workers do not get killed. After some time RAM usage gets at it's maximum, and starts to throw errors. So I killed the gunicorn app but the thing is processes spawned by main gunicorn proces did not get killed and still using all the memory.
Environment:
OS: Ubuntu 18.04
FastAPI Version : 0.38.1
Python version : 3.7.4
The text was updated successfully, but these errors were encountered: