Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interplay with multiprocessing #841

Closed
ananis25 opened this issue Jan 9, 2020 · 8 comments
Closed

interplay with multiprocessing #841

ananis25 opened this issue Jan 9, 2020 · 8 comments

Comments

@ananis25
Copy link
Contributor

ananis25 commented Jan 9, 2020

Describe the bug

I'm trying to run parallel tasks with a timeout per task (using multiprocessing) inside an API method. On trying to terminate the child processes post the time limit, the server process shuts down and disconnects.

To Reproduce

  1. Create a file: repro.py
import os
import time
import uvicorn
from fastapi import FastAPI
from concurrent.futures import ProcessPoolExecutor

app = FastAPI(title='multiprocessing-issue')

def simple_routine(sleep_for):
    print(f"PID {os.getpid()} has run time: {sleep_for}")
    time.sleep(sleep_for)
    return "done"

@app.post("/test-endpoint/?")
def test_endpoint():
    print(f"main process: {os.getpid()}")

    START_TIME = time.time()
    with ProcessPoolExecutor(max_workers=3) as pool:
        futures = [
            pool.submit(simple_routine, 1), 
            pool.submit(simple_routine, 1),
            pool.submit(simple_routine, 20), 
        ]

        results = []
        for fut in futures:
            try:
                results.append(fut.result(timeout=2))
            except:
                results.append("not done")

       # terminate the processes which are still running
        for pid, proc in pool._processes.items():
            print("terminating pid ", pid)
            proc.terminate()
    
    print("\n", "exiting at: ", int(time.time() - START_TIME))
    return True

if __name__=="__main__":
    uvicorn.run(app, host="0.0.0.0", port=5000)
  1. Run it as python repro.py.
  2. Open another python interpreter and make this web request.
import requests
for _ in range(20):
    print(requests.post("http://localhost:5000/test-endpoint").text)
  1. The server process shuts down after the first request.

Expected behavior

The server shouldn't shut down and continue serving requests. Interestingly, the server logs a shutdown signal when I try to terminate the subprocesses but doesn't actually exit until the long running process is complete.

INFO:     Started server process [615]
INFO:     Uvicorn running on http://0.0.0.0:5900 (Press CTRL+C to quit)
INFO:     Waiting for application startup.
INFO:     Application startup complete.
main process: 615

PID 637 has run time: 1
PID 638 has run time: 1
PID 639 has run time: 10
terminating pid  637
terminating pid  638
terminating pid  639
INFO:     Shutting down
INFO:     Finished server process [615]

exiting at:  10

Environment

  • OS: [Ubuntu 18.04.1 LTS]
  • FastAPI Version: 0.45.0
  • Python version: 3.6.8

Additional context

I'm trying to move a working wsgi API written with Flask to FastAPI, really for the brevity and automatic documentation. The parallel tasks are cpu intensive methods and the termination works as expected with Flask. For context, the output from the Flask logger which is the expected behavior.

The issue likely has something to do with how FastAPI runs sync methods but I can't quite figure it.

main process: 1015
PID 1035 has run time: 1
PID 1039 has run time: 1
PID 1038 has run time: 10
terminating pid  1035
terminating pid  1038
terminating pid  1039
exiting at:  3

127.0.0.1 - - [09/Jan/2020 08:51:37] "POST /test-endpoint HTTP/1.1" 200 -

Thank you for looking.

@ananis25 ananis25 added the bug Something isn't working label Jan 9, 2020
@dmontagu
Copy link
Collaborator

dmontagu commented Jan 9, 2020

Does it also exit if you do it in an async def endpoint?

Also, my understanding was that you don’t need to explicitly terminate the processes in a process pool — when the pool’s context block is exited, the processes are automatically terminated. So I would think you could just safely drop the loop over pool._processes. Is that wrong?

@ananis25
Copy link
Contributor Author

ananis25 commented Jan 9, 2020

Yeah, an async def endpoint gives the same behavior. A difference is that Uvicorn logs the shutdown signal after completing the running request whereas with a def endpoint, it did it with the request running.

with async def

INFO:     Application startup complete.
main process: 6058
PID 6065 has sleep time: 1
PID 6064 has sleep time: 1
PID 6066 has sleep time: 10
terminating pid  6064
terminating pid  6065
terminating pid  6066
exiting at:  10
INFO:     127.0.0.1:49702 - "POST /test-endpoint HTTP/1.1" 200 OK
INFO:     Shutting down
INFO:     Finished server process [6058]

with a def endpoint

INFO:     Application startup complete.
main process: 5880
PID 5901 has sleep time: 1
PID 5902 has sleep time: 1
PID 5903 has sleep time: 10
terminating pid  5901
terminating pid  5902
terminating pid  5903
INFO:     Shutting down
INFO:     Finished server process [5880]
exiting at:  10

You're right that the process pool terminates its worker processes when exiting the context block. Though it waits for them to finish. We want to return as soon as the time limit is exceeded, hence the code to terminate the workers manually.

@dmontagu
Copy link
Collaborator

dmontagu commented Jan 9, 2020

I see, makes sense, didn’t realize it waits for things to complete but that obviously makes sense.

Hmm, sounds like probably the issue can be traced back to starlette or more likely uvicorn. You might try running in the hypercorn server and see if the same error happens — that would definitively identify uvicorn as the problem. But I wouldn’t necessarily ditch uvicorn even in that case since uvicorn is very fast.

Have you tried calling pool.terminate() instead of the terminate method on each process? If the problem is the with block, maybe try dropping the with and manually cleaning up with pool.terminate() just to see if it works? (If it does, we can go back and worry about getting all the safety the with block provides.)

Also, reading through the python docs, it seems like the apply_async method might be better suited for what you are trying to do (your current approach waits for each future in order, so later elements of the list have longer timeouts, I think?). I’d be curious to hear if a similarly structured endpoint using apply_async runs into the same issues.

@ananis25
Copy link
Contributor Author

ananis25 commented Jan 9, 2020

Thank you for the pointer, I can replicate this issue in starlette too, so it is not a FastAPI specific issue. Would you recommend I close this issue and open one in starlette's repo?

Hypercorn doesn't seem to run a starlette/FastAPI application out of the box (despite the ASGI promise?). I'll try to debug it and post.

Good spot! I had left out the global timer since it added noise. Probably useful to add it back.

(your current approach waits for each future in order, so later elements of the list have longer timeouts, I think?)

Using the apply_async API to multiprocessing and attempting a terminate on the Pool object leaves the server process in a stuck state. I'd imagine the with context block does other OS related work to prevent this.

import os
import time
import uvicorn
from multiprocessing import Pool
from fastapi import FastAPI


def simple_routine(sleep_for):
    print(f"PID {os.getpid()} has sleep time: {sleep_for}")
    time.sleep(sleep_for)
    return "done"

app = FastAPI()

@app.post("/test-endpoint/?")
async def test_endpoint():
    print(f"main process: {os.getpid()}")

    START_TIME = time.time()
    STOP_TIME = START_TIME + 2

    pool = Pool(processes=3)
    futures = [
        pool.apply_async(simple_routine, [1]), 
        pool.apply_async(simple_routine, [1]),
        pool.apply_async(simple_routine, [10]), 
    ]

    results = []
    for fut in futures:
        remains = max(STOP_TIME - time.time(), 0)
        try:
            results.append(fut.get(timeout = remains))
        except:
            results.append("not done")

    # terminate the entire pool
    pool.terminate()
    print("exiting at: ", int(time.time() - START_TIME))
    return "True"


if __name__=="__main__":
    uvicorn.run(app, host="0.0.0.0", port=5900)

@dmontagu
Copy link
Collaborator

dmontagu commented Jan 9, 2020

Okay, this version (using ProcessPoolExecutor) at least works for me (server remains responsive and returns valid responses), though with some ugly exceptions getting raised inside the executor:

import os
import time
from concurrent.futures.process import ProcessPoolExecutor

import uvicorn
from multiprocessing import Pool
from fastapi import FastAPI


def simple_routine(sleep_for):
    print(f"PID {os.getpid()} has sleep time: {sleep_for}")
    time.sleep(sleep_for)
    return "done"

app = FastAPI()

@app.post("/test-endpoint/?")
async def test_endpoint():
    print(f"main process: {os.getpid()}")

    START_TIME = time.time()
    STOP_TIME = START_TIME + 2

    pool = ProcessPoolExecutor(max_workers=3)
    futures = [
        pool.submit(simple_routine, [1]),
        pool.submit(simple_routine, [1]),
        pool.submit(simple_routine, [10]),
    ]
    results = []
    for fut in futures:
        remains = max(STOP_TIME - time.time(), 0)
        try:
            results.append(fut.get(timeout = remains))
        except:
            results.append("not done")

    # terminate the entire pool
    pool.shutdown(wait=False)
    print("exiting at: ", int(time.time() - START_TIME))
    return "True"


if __name__=="__main__":
    uvicorn.run(app, host="0.0.0.0", port=5900)

Might be worth raising an issue in starlette/uvicorn if you can throw together a minimal example.

@dmontagu
Copy link
Collaborator

dmontagu commented Jan 9, 2020

Just kidding, the processes were being created but not cleaned up. Ugh. Yeah, I'd probably make a starlette issue.

@ananis25
Copy link
Contributor Author

Thank you @dmontagu for looking. I created an issue in the uvicorn repository, will close this one.

@tiangolo
Copy link
Member

tiangolo commented Apr 8, 2020

Thanks for the help here @dmontagu ! 🍰 🙇‍♂️

Thanks @ananis25 for reporting back and closing the issue. 👍

@tiangolo tiangolo added question Question or problem answered reviewed and removed bug Something isn't working labels Feb 22, 2023
@tiangolo tiangolo changed the title [QUESTION] interplay with multiprocessing interplay with multiprocessing Feb 24, 2023
@tiangolo tiangolo reopened this Feb 28, 2023
@fastapi fastapi locked and limited conversation to collaborators Feb 28, 2023
@tiangolo tiangolo converted this issue into discussion #7889 Feb 28, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants