interplay with multiprocessing #841

ananis25 · 2020-01-09T08:55:47Z

Describe the bug

I'm trying to run parallel tasks with a timeout per task (using multiprocessing) inside an API method. On trying to terminate the child processes post the time limit, the server process shuts down and disconnects.

To Reproduce

Create a file: repro.py

import os
import time
import uvicorn
from fastapi import FastAPI
from concurrent.futures import ProcessPoolExecutor

app = FastAPI(title='multiprocessing-issue')

def simple_routine(sleep_for):
    print(f"PID {os.getpid()} has run time: {sleep_for}")
    time.sleep(sleep_for)
    return "done"

@app.post("/test-endpoint/?")
def test_endpoint():
    print(f"main process: {os.getpid()}")

    START_TIME = time.time()
    with ProcessPoolExecutor(max_workers=3) as pool:
        futures = [
            pool.submit(simple_routine, 1), 
            pool.submit(simple_routine, 1),
            pool.submit(simple_routine, 20), 
        ]

        results = []
        for fut in futures:
            try:
                results.append(fut.result(timeout=2))
            except:
                results.append("not done")

       # terminate the processes which are still running
        for pid, proc in pool._processes.items():
            print("terminating pid ", pid)
            proc.terminate()
    
    print("\n", "exiting at: ", int(time.time() - START_TIME))
    return True

if __name__=="__main__":
    uvicorn.run(app, host="0.0.0.0", port=5000)

Run it as python repro.py.
Open another python interpreter and make this web request.

import requests
for _ in range(20):
    print(requests.post("http://localhost:5000/test-endpoint").text)

The server process shuts down after the first request.

Expected behavior

The server shouldn't shut down and continue serving requests. Interestingly, the server logs a shutdown signal when I try to terminate the subprocesses but doesn't actually exit until the long running process is complete.

INFO:     Started server process [615]
INFO:     Uvicorn running on http://0.0.0.0:5900 (Press CTRL+C to quit)
INFO:     Waiting for application startup.
INFO:     Application startup complete.
main process: 615

PID 637 has run time: 1
PID 638 has run time: 1
PID 639 has run time: 10
terminating pid  637
terminating pid  638
terminating pid  639
INFO:     Shutting down
INFO:     Finished server process [615]

exiting at:  10

Environment

OS: [Ubuntu 18.04.1 LTS]
FastAPI Version: 0.45.0
Python version: 3.6.8

Additional context

I'm trying to move a working wsgi API written with Flask to FastAPI, really for the brevity and automatic documentation. The parallel tasks are cpu intensive methods and the termination works as expected with Flask. For context, the output from the Flask logger which is the expected behavior.

The issue likely has something to do with how FastAPI runs sync methods but I can't quite figure it.

main process: 1015
PID 1035 has run time: 1
PID 1039 has run time: 1
PID 1038 has run time: 10
terminating pid  1035
terminating pid  1038
terminating pid  1039
exiting at:  3

127.0.0.1 - - [09/Jan/2020 08:51:37] "POST /test-endpoint HTTP/1.1" 200 -

Thank you for looking.

The text was updated successfully, but these errors were encountered:

dmontagu · 2020-01-09T09:25:37Z

Does it also exit if you do it in an async def endpoint?

Also, my understanding was that you don’t need to explicitly terminate the processes in a process pool — when the pool’s context block is exited, the processes are automatically terminated. So I would think you could just safely drop the loop over pool._processes. Is that wrong?

ananis25 · 2020-01-09T09:49:49Z

Yeah, an async def endpoint gives the same behavior. A difference is that Uvicorn logs the shutdown signal after completing the running request whereas with a def endpoint, it did it with the request running.

with async def

INFO:     Application startup complete.
main process: 6058
PID 6065 has sleep time: 1
PID 6064 has sleep time: 1
PID 6066 has sleep time: 10
terminating pid  6064
terminating pid  6065
terminating pid  6066
exiting at:  10
INFO:     127.0.0.1:49702 - "POST /test-endpoint HTTP/1.1" 200 OK
INFO:     Shutting down
INFO:     Finished server process [6058]

with a def endpoint

INFO:     Application startup complete.
main process: 5880
PID 5901 has sleep time: 1
PID 5902 has sleep time: 1
PID 5903 has sleep time: 10
terminating pid  5901
terminating pid  5902
terminating pid  5903
INFO:     Shutting down
INFO:     Finished server process [5880]
exiting at:  10

You're right that the process pool terminates its worker processes when exiting the context block. Though it waits for them to finish. We want to return as soon as the time limit is exceeded, hence the code to terminate the workers manually.

dmontagu · 2020-01-09T10:37:05Z

I see, makes sense, didn’t realize it waits for things to complete but that obviously makes sense.

Hmm, sounds like probably the issue can be traced back to starlette or more likely uvicorn. You might try running in the hypercorn server and see if the same error happens — that would definitively identify uvicorn as the problem. But I wouldn’t necessarily ditch uvicorn even in that case since uvicorn is very fast.

Have you tried calling pool.terminate() instead of the terminate method on each process? If the problem is the with block, maybe try dropping the with and manually cleaning up with pool.terminate() just to see if it works? (If it does, we can go back and worry about getting all the safety the with block provides.)

Also, reading through the python docs, it seems like the apply_async method might be better suited for what you are trying to do (your current approach waits for each future in order, so later elements of the list have longer timeouts, I think?). I’d be curious to hear if a similarly structured endpoint using apply_async runs into the same issues.

ananis25 · 2020-01-09T12:55:23Z

Thank you for the pointer, I can replicate this issue in starlette too, so it is not a FastAPI specific issue. Would you recommend I close this issue and open one in starlette's repo?

Hypercorn doesn't seem to run a starlette/FastAPI application out of the box (despite the ASGI promise?). I'll try to debug it and post.

Good spot! I had left out the global timer since it added noise. Probably useful to add it back.

(your current approach waits for each future in order, so later elements of the list have longer timeouts, I think?)

Using the apply_async API to multiprocessing and attempting a terminate on the Pool object leaves the server process in a stuck state. I'd imagine the with context block does other OS related work to prevent this.

import os
import time
import uvicorn
from multiprocessing import Pool
from fastapi import FastAPI


def simple_routine(sleep_for):
    print(f"PID {os.getpid()} has sleep time: {sleep_for}")
    time.sleep(sleep_for)
    return "done"

app = FastAPI()

@app.post("/test-endpoint/?")
async def test_endpoint():
    print(f"main process: {os.getpid()}")

    START_TIME = time.time()
    STOP_TIME = START_TIME + 2

    pool = Pool(processes=3)
    futures = [
        pool.apply_async(simple_routine, [1]), 
        pool.apply_async(simple_routine, [1]),
        pool.apply_async(simple_routine, [10]), 
    ]

    results = []
    for fut in futures:
        remains = max(STOP_TIME - time.time(), 0)
        try:
            results.append(fut.get(timeout = remains))
        except:
            results.append("not done")

    # terminate the entire pool
    pool.terminate()
    print("exiting at: ", int(time.time() - START_TIME))
    return "True"


if __name__=="__main__":
    uvicorn.run(app, host="0.0.0.0", port=5900)

dmontagu · 2020-01-09T19:51:07Z

Okay, this version (using ProcessPoolExecutor) at least works for me (server remains responsive and returns valid responses), though with some ugly exceptions getting raised inside the executor:

import os
import time
from concurrent.futures.process import ProcessPoolExecutor

import uvicorn
from multiprocessing import Pool
from fastapi import FastAPI


def simple_routine(sleep_for):
    print(f"PID {os.getpid()} has sleep time: {sleep_for}")
    time.sleep(sleep_for)
    return "done"

app = FastAPI()

@app.post("/test-endpoint/?")
async def test_endpoint():
    print(f"main process: {os.getpid()}")

    START_TIME = time.time()
    STOP_TIME = START_TIME + 2

    pool = ProcessPoolExecutor(max_workers=3)
    futures = [
        pool.submit(simple_routine, [1]),
        pool.submit(simple_routine, [1]),
        pool.submit(simple_routine, [10]),
    ]
    results = []
    for fut in futures:
        remains = max(STOP_TIME - time.time(), 0)
        try:
            results.append(fut.get(timeout = remains))
        except:
            results.append("not done")

    # terminate the entire pool
    pool.shutdown(wait=False)
    print("exiting at: ", int(time.time() - START_TIME))
    return "True"


if __name__=="__main__":
    uvicorn.run(app, host="0.0.0.0", port=5900)

Might be worth raising an issue in starlette/uvicorn if you can throw together a minimal example.

dmontagu · 2020-01-09T19:55:26Z

Just kidding, the processes were being created but not cleaned up. Ugh. Yeah, I'd probably make a starlette issue.

ananis25 · 2020-01-10T04:55:43Z

Thank you @dmontagu for looking. I created an issue in the uvicorn repository, will close this one.

tiangolo · 2020-04-08T05:54:38Z

Thanks for the help here @dmontagu ! 🍰 🙇‍♂️

Thanks @ananis25 for reporting back and closing the issue. 👍

ananis25 added the bug Something isn't working label Jan 9, 2020

ananis25 mentioned this issue Jan 10, 2020

[QUESTION] interplay with multiprocessing encode/uvicorn#548

Closed

ananis25 closed this as completed Jan 10, 2020

tanaysoni mentioned this issue Jun 10, 2020

Fix for CI problem - closing multiprocessing.pool again deepset-ai/FARM#403

Merged

5 tasks

tiangolo added question Question or problem answered reviewed and removed bug Something isn't working labels Feb 22, 2023

tiangolo changed the title ~~[QUESTION] interplay with multiprocessing~~ interplay with multiprocessing Feb 24, 2023

tiangolo reopened this Feb 28, 2023

tiangolo added the question-migrate label Feb 28, 2023

fastapi locked and limited conversation to collaborators Feb 28, 2023

tiangolo converted this issue into discussion #7889 Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

interplay with multiprocessing #841

interplay with multiprocessing #841

ananis25 commented Jan 9, 2020 •

edited

Loading

dmontagu commented Jan 9, 2020

ananis25 commented Jan 9, 2020

dmontagu commented Jan 9, 2020 •

edited

Loading

ananis25 commented Jan 9, 2020

dmontagu commented Jan 9, 2020 •

edited

Loading

dmontagu commented Jan 9, 2020

ananis25 commented Jan 10, 2020

tiangolo commented Apr 8, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

interplay with multiprocessing #841

interplay with multiprocessing #841

Comments

ananis25 commented Jan 9, 2020 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Environment

Additional context

dmontagu commented Jan 9, 2020

ananis25 commented Jan 9, 2020

dmontagu commented Jan 9, 2020 • edited Loading

ananis25 commented Jan 9, 2020

dmontagu commented Jan 9, 2020 • edited Loading

dmontagu commented Jan 9, 2020

ananis25 commented Jan 10, 2020

tiangolo commented Apr 8, 2020

This issue was moved to a discussion.

ananis25 commented Jan 9, 2020 •

edited

Loading

dmontagu commented Jan 9, 2020 •

edited

Loading

dmontagu commented Jan 9, 2020 •

edited

Loading