Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming uploaded file not working #2578

Closed
dfroger opened this issue Dec 29, 2020 · 17 comments
Closed

streaming uploaded file not working #2578

dfroger opened this issue Dec 29, 2020 · 17 comments
Labels
question Question or problem question-migrate

Comments

@dfroger
Copy link

dfroger commented Dec 29, 2020

Hi,

I'm deploying my application on a OVH VPS and I'm having issue with streaming file that I upload.

I can reproduce the issue on a minimal example where the streaming works when using uvicorn only, but fails when using uvicorn + fastapi.

Here is the uvicorn only code:

import time


async def read_body(receive):
    start = time.time()
    more_body = True
    i = 0
    total = 0
    body = b''
    while more_body:
        message = await receive()
        part = message.get('body', b'')
        total += len(part)
        i += 1
        print(f"[{i:5}] {time.time() - start} {total} bytes")
        more_body = message.get('more_body', False)
    return total


async def app(scope, receive, send):
    total = await read_body(receive)
    await send({
        'type': 'http.response.start',
        'status': 200,
        'headers': [
            [b'content-type', b'text/plain'],
        ]
    })
    await send({
        'type': 'http.response.body',
        'body': f"Got {total} bytes\n".encode("utf-8")
    })

I run it with:

sudo /path/to/uvicorn stream_req:app --host 0.0.0.0 --port 80 --workers 1

The client command to upload a file, from my laptop is:

curl -X POST -F file=@medium.mp3 http://<VPS_IP>

I get the following output:

[    1] 0.030229806900024414 1448 bytes
[    2] 0.04207563400268555 2896 bytes
[    3] 0.0538487434387207 4344 bytes
(...)
[  200] 2.6474087238311768 291048 bytes
[  201] 2.6665892601013184 292496 bytes
[  202] 2.6782875061035156 293944 bytes
(...)
[  400] 6.049436092376709 663184 bytes
[  401] 6.062062501907349 664632 bytes
[  402] 6.073895692825317 666080 bytes
(...)
[  600] 9.408700466156006 1036768 bytes
[  601] 9.42050576210022 1038216 bytes
[  602] 9.432065963745117 1039664 bytes
(...)
[  800] 12.018092632293701 1326368 bytes
[  801] 12.029752254486084 1327816 bytes
[  802] 12.041809797286987 1329264 bytes
(...)
[ 1000] 14.651684761047363 1615968 bytes
[ 1001] 14.660185813903809 1617416 bytes
[ 1002] 14.672219514846802 1618864 bytes
(...)
[ 1200] 18.46268582344055 2050368 bytes
[ 1201] 18.47444438934326 2051816 bytes
[ 1202] 18.490278959274292 2053264 bytes
(...)
[ 1400] 21.79162621498108 2428296 bytes
[ 1401] 21.803564071655273 2429744 bytes
[ 1402] 21.819947242736816 2431192 bytes
(...)
[ 1600] 24.478782415390015 2717896 bytes
[ 1601] 24.490610361099243 2719344 bytes
[ 1602] 24.50354790687561 2720792 bytes
(...)
[ 1727] 26.618733406066895 2961160 bytes
[ 1728] 26.630817651748657 2962608 bytes
[ 1729] 26.63267707824707 2962801 bytes

Here is the fastAPI code:

import time

from fastapi import FastAPI, UploadFile, File


app = FastAPI()


@app.post("/upload")
async def upload_file(file: UploadFile = File(...)):
    start = time.time()
    total = 0
    i = 0
    while True:
        content = await file.read(128*1024)
        if content == b"":
            break
        total += len(content)
        i += 1
        print(f"[{i:5}] {time.time() - start} {total} bytes")

I run it with:

sudo /path/to/uvicorn upload:app --host 0.0.0.0 --port 80 --workers 1

The client command is:

curl -X POST -F file=@medium.mp3 http://<VPS_IP>/upload

And the output is, after a moment (about 26 seconds I guess):

[    1] 0.00036787986755371094 131072 bytes
[    2] 0.0009517669677734375 262144 bytes
[    3] 0.0015041828155517578 393216 bytes
[    4] 0.001993894577026367 524288 bytes
[    5] 0.0025484561920166016 655360 bytes
[    6] 0.0030825138092041016 786432 bytes
[    7] 0.0034346580505371094 917504 bytes
[    8] 0.0038461685180664062 1048576 bytes
[    9] 0.00414276123046875 1179648 bytes
[   10] 0.004529476165771484 1310720 bytes
[   11] 0.00487208366394043 1441792 bytes
[   12] 0.005192995071411133 1572864 bytes
[   13] 0.0055615901947021484 1703936 bytes
[   14] 0.00597381591796875 1835008 bytes
[   15] 0.0063097476959228516 1966080 bytes
[   16] 0.0067272186279296875 2097152 bytes
[   17] 0.00703740119934082 2228224 bytes
[   18] 0.007361412048339844 2359296 bytes
[   19] 0.007636308670043945 2490368 bytes
[   20] 0.007856369018554688 2621440 bytes
[   21] 0.008106231689453125 2752512 bytes
[   22] 0.008345603942871094 2883584 bytes
[   23] 0.008610725402832031 2962599 bytes

So the upload file seems to be buffered somewhere, and then pass to the upload_file function.

Both code run on the same machine (Debian 10), with Python 3.8.6 (installed with pyenv).

My virtualenv contains:

Package          Version
---------------- -------
aiofiles         0.6.0
bcrypt           3.2.0
berger           1.0.0
cffi             1.14.4
click            7.1.2
cryptography     3.3.1
ecdsa            0.14.1
fastapi          0.63.0
gunicorn         20.0.4
h11              0.11.0
httptools        0.1.1
passlib          1.7.4
pip              20.2.1
pyasn1           0.4.8
pycparser        2.20
pydantic         1.7.3
python-dateutil  2.8.1
python-dotenv    0.15.0
python-jose      3.2.0
python-multipart 0.0.5
rsa              4.6
setuptools       49.2.1
six              1.15.0
starlette        0.13.6
uvicorn          0.13.2
uvloop           0.14.0
wheel            0.36.2

I'll add more infos if I think to, and will continue debugging...

Thanks!

@dfroger dfroger added the question Question or problem label Dec 29, 2020
@Kludex
Copy link
Sponsor Collaborator

Kludex commented Dec 29, 2020

Does your issue happen with the exact application you have in the snippet? Or that's the endpoint where you verify this behavior in the whole application?

@dfroger
Copy link
Author

dfroger commented Dec 29, 2020

The issue occurs in the exact application that I copy-paste above (and also in my whole application of course!)

@Kludex
Copy link
Sponsor Collaborator

Kludex commented Dec 29, 2020

I don't have any clue. It works here.

Could you try to change the --http parameter?

sudo /path/to/uvicorn upload:app --host 0.0.0.0 --port 80 --workers 1 --http h11 (and --http httptools)

@dfroger
Copy link
Author

dfroger commented Dec 29, 2020

same behavior with --http h11 and --http httpstools (streaming not working).

Do you have enough network latency to reproduce the problem? If I run the server on the same machine as the client, I can observe the issue with curl by adding --limit-rate (just found the option...), for example:

curl -X POST -F file=@medium.mp3 --limit-rate 200K http://127.0.0.1:8000/upload

@falkben
Copy link
Sponsor Contributor

falkben commented Dec 30, 2020

Maybe I missed it, but I feel like I'm missing the actual problem you are experiencing.

My understanding is that the issue you are facing is that it doesn't appear to be consuming the uploaded file as a stream? But you are able to receive the file in fastapi as uploaded, right?

I didn't fully understand how the stream got consumed in this case, so I attempted to trace it through. The stream seems to be be consumed by the time it gets to your path operators by the form parser, which is called prior to any of the code in your path operator.

Specifically, the body is processed as a form here which then calls the Starlette form function here which then calls the form parser, which in this case it's a multipart form, so it uses the MultiPartParser, in which it consumes the stream in this for loop by iterating over chunks of the stream. Because it's an upload file, it creates an UploadFile object, and then adds content to the (in memory) file as it consumes the stream. If it overflows the memory buffer (which it would with a large file), it writes the data out to a tempfile I believe.

If you are looking to iterate over the chunks of the stream directly in your route for some reason, I guess you would have to override this behavior of starlette's form parser. You could do that in a couple ways I suppose, one of which is that you could create a custom request and route, documented here. In that case, you'd replace the form method with whatever you'd like to do with the file.

I'm curious, why do you need the file as a stream inside your route in the first place? It seems as though starlette/fastapi have handled this upload stream and so all you should have to do is await file.read() to get the content.

@dfroger
Copy link
Author

dfroger commented Dec 30, 2020

I my application, the user can upload a large file, for example 70MB, by POSTing to an URL. The FastAPI function (the one decorated with @app.post) do a await file.read(), then a await outfile.write (with aiofiles).

To provide feedback to the user (a progress bar), the FastAPI function also stores in a global dict the number of bytes in has read, associated to an ID that the client knows and can use to GET this number of bytes, so it can update its progress bar, for example each 100ms.

But with the issue, the FastAPI function is called when all the file as been transferred, so the progress shows 0%, and some minutes later, it shows 100%...

The upload progress could also by managed by nginx (which I use). But I would be more comfortable to do it with FastAPI. Also, the issue would make FastAPI to consume 70M of memory, while maybe 1M or less would be necessary, if bytes are written to the disk as they are received, then replaced by the next bytes. (ok, as long as writting to the disk is faster than the network latency...).

I would really expect an await file.read() for an UploadFile to get the bytes as soon as they are available, not when the full file has been uploaded, am I wrong?

Thanks for tracing the code! I'll have a look on it. (that's a "side project", so I may not reply quickly during the day...)

EDIT: my first post is maybe not clear about what the issue is, I'll try to update it in the day!

@dfroger
Copy link
Author

dfroger commented Dec 30, 2020

@falkben I've read your answer again, you are right, the body = await request.form() is blocking.

I'll try to make a custom request and route, as you suggest to solve my issue. If it works and is useful, I could then make a PR for the code, or at least to add a note in the doc that await myfile.read(1024) is executed after the full body has been received.

@falkben
Copy link
Sponsor Contributor

falkben commented Dec 30, 2020

Yea, perhaps the docs could be more explicit that the file received in the path operator is not a stream.

@dfroger
Copy link
Author

dfroger commented Dec 30, 2020

I understand it better now, the file can not but a stream, as the multipart/form-data may contains another field after that needs to be validated before the FastAPI function can be executed.

I'll look at python-mulitpart, if the "callbacks" are something that can be used to implement an "upload progress". (I'm also wondering if it's possible for example to check a content-type header, and stop the upload if it does'nt have a correct value).

@dfroger
Copy link
Author

dfroger commented Dec 30, 2020

So python-multipart seems to have all the callbacks to do whatever we want but these callbacks can not be set directly in starlette.

I think I have to choose between monkey-patching MultiPartParser.on_part_data and other callbacks (another issue on customize the class, for example) or manage the big files upload using nginx.

I'll experiment with both...

@dfroger
Copy link
Author

dfroger commented Dec 31, 2020

Actually, I'll just use XMLHttpRequest: progress event in the client and keep the file not streamed in the FastAPI function : not enough advantage to code it (Custom Request and APIRoute class using directly python-multipart should be a great solution if needed).

I close the issue, thanks!

@dfroger dfroger closed this as completed Dec 31, 2020
@falkben
Copy link
Sponsor Contributor

falkben commented Dec 31, 2020

Yes was thinking this problem would be better solved on the client... Glad you found a solution!

@antonwnk
Copy link

I was searching for some information about whether one could process a large file in FastAPI, part by part, while it's being beamed over the network. Am I understanding correctly that the conversation above was specifically about this? Thanks!

The specific use-case I had in mind was starting to validate a (potentially large) file being received before the upload finishes.

@dfroger
Copy link
Author

dfroger commented Feb 19, 2021

@antonwnk yes, this conversion was about this!

My understanding is that by default, he whole file that is received by the FastAPI application is buffered:

The file is parsed by python-multipart (from the multipart/form-data HTTP body). Only once the full body has been parsed, ie when the full file has been uploaded and stored in a buffer, FastAPI can build and validate the path operation function parameters, and then process the path operation function. The path operation function can then process the UploadFile parameter asynchronously, but as we said, the file has already been fully uploaded to the buffer.

I closed the issue because for my use case, I managed it on the client side (monitor the file upload progress).

If I had to process an uploaded file in a FastAPI application while it's being beamed over the network, I was come to the conclusion to make special case for this particular path operation function that would use Custom Request and APIRoute class using directly python-multipart.

@antonwnk
Copy link

Thank you very much for the summary! This issue is really a treasure trove of information.
If I end up going for this method of "incremental processing on FastAPI" for my problem, I'll be sure to try your approach and report back with my findings!

@antonwnk
Copy link

I think this was actually supported via Starlette requests #58 (comment)

This does indeed start to print as soon as I send the large file with curl

@app.post("/")
async def stream_file(request: Request):
    chunks = 0 
    logger.info(f"Starting to stream")
    async for _ in request.stream():
        chunks += 1
        logger.info(f"got chunk {chunks}")

@tiangolo
Copy link
Owner

Thanks for the help here everyone! 👏 🙇

Thanks for reporting back and closing the issue @dfroger 👍

Sorry for the long delay! 🙈 I wanted to personally address each issue/PR and they piled up through time, but now I'm checking each one in order.

@tiangolo tiangolo reopened this Feb 27, 2023
Repository owner locked and limited conversation to collaborators Feb 27, 2023
@tiangolo tiangolo converted this issue into discussion #6934 Feb 27, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
question Question or problem question-migrate
Projects
None yet
Development

No branches or pull requests

5 participants