Streaming POST body uploads #262

Closed
luite opened this Issue Feb 10, 2012 · 20 comments

Comments

Projects
None yet
3 participants
@luite
Member

luite commented Feb 10, 2012

tsuraan mentioned the following on reddit:

it would be really nice if yesod had equivalents to Snap's handleMultipart, 
runRequestBody, and setResponseBody. These three functions make it
easy to write a site that can handle arbitrary amounts of data without doing
any writing to the local filesystem, which is a really nice thing to be able to do.
Right now, runRequestBody and setResponseBody can be done at the wai level,
but I couldn't find an equivalent to handleMultipart, and besides it would be cool
if they were exposed to the yesod level anyhow."

This might be an API omission. It also looks like it's not possible to customize the FileSink in the Yesod API.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Feb 11, 2012

Member

It's all possible via waiRequest and sendWaiResponse, though I'm not at all opposed to providing higher-level functionality. We also might want to consider abstracting the details of where an uploaded file's contents are stored.

Member

snoyberg commented Feb 11, 2012

It's all possible via waiRequest and sendWaiResponse, though I'm not at all opposed to providing higher-level functionality. We also might want to consider abstracting the details of where an uploaded file's contents are stored.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jun 5, 2012

Member

See #358 regarding the FileSink customization (now BackEnd). Do you have any thoughts on what a higher level API would be?

Member

snoyberg commented Jun 5, 2012

See #358 regarding the FileSink customization (now BackEnd). Do you have any thoughts on what a higher level API would be?

snoyberg added a commit that referenced this issue Jul 2, 2012

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 2, 2012

Member

I believe FileUploadSource provides a relatively high-level streaming interface. Without specific use cases, I can't really try to improve it further. @luite Do you have any thoughts?

Member

snoyberg commented Jul 2, 2012

I believe FileUploadSource provides a relatively high-level streaming interface. Without specific use cases, I can't really try to improve it further. @luite Do you have any thoughts?

@snoyberg snoyberg closed this Jul 2, 2012

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 9, 2012

How does one use FileUploadSource? It’s only defined in the hidden module Yesod.Internal.Request, only reexported in the hidden module Yesod.Internal.Core, and is never constructed anywhere.

andersk commented Jul 9, 2012

How does one use FileUploadSource? It’s only defined in the hidden module Yesod.Internal.Request, only reexported in the hidden module Yesod.Internal.Core, and is never constructed anywhere.

snoyberg added a commit that referenced this issue Jul 9, 2012

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 9, 2012

Member

That was a mistake, it should be exported from Yesod.Core. The newest commit fixes that.

Member

snoyberg commented Jul 9, 2012

That was a mistake, it should be exported from Yesod.Core. The newest commit fixes that.

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 9, 2012

FileUploadSource :: Sink ByteString (ResourceT IO) (Source (ResourceT IO) ByteString) -> FileUpload

This API doesn’t seem to provide streaming uploads; it looks more like a generalized way to slurp in an entire file before spitting the entire thing out again. That doesn’t really help if you imagine that I want to do some kind of live processing on the file as it comes in—even something as simple as displaying an AJAXy progress meter for the uploaded file. And for some applications, I may not even want to keep the entire file around to spit back out at the end.

andersk commented Jul 9, 2012

FileUploadSource :: Sink ByteString (ResourceT IO) (Source (ResourceT IO) ByteString) -> FileUpload

This API doesn’t seem to provide streaming uploads; it looks more like a generalized way to slurp in an entire file before spitting the entire thing out again. That doesn’t really help if you imagine that I want to do some kind of live processing on the file as it comes in—even something as simple as displaying an AJAXy progress meter for the uploaded file. And for some applications, I may not even want to keep the entire file around to spit back out at the end.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 9, 2012

Member

If you don't want to keep the contents of the file, then return a dummy Source. But what prevents you from being able to perform "live processing". I'd really like something more concrete for a design goal than "streaming uploads," as it's far too vague.

Member

snoyberg commented Jul 9, 2012

If you don't want to keep the contents of the file, then return a dummy Source. But what prevents you from being able to perform "live processing". I'd really like something more concrete for a design goal than "streaming uploads," as it's far too vague.

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 9, 2012

That’s why I provided AJAXy progress meters as an example. In the current design, the POST handler has no opportunity to communicate with the fileUpload function until after the entire file is uploaded, so the latter can’t know where to send progress updates to.

I conjecture that a good API would be a generalization of fileAFormReq that, instead of returning an AForm sub master FileInfo, takes any FileName -> ContentType -> Sink ByteString (ResourceT IO) t and returns an AForm sub master t.

andersk commented Jul 9, 2012

That’s why I provided AJAXy progress meters as an example. In the current design, the POST handler has no opportunity to communicate with the fileUpload function until after the entire file is uploaded, so the latter can’t know where to send progress updates to.

I conjecture that a good API would be a generalization of fileAFormReq that, instead of returning an AForm sub master FileInfo, takes any FileName -> ContentType -> Sink ByteString (ResourceT IO) t and returns an AForm sub master t.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 11, 2012

Member

If we're talking about having an API that allows you to produce output to the user while consuming input, I think it has to sit at the WAI level. There's no way we could return a normal Yesod response after we've already started sending data to the client. Perhaps we can still provide something to help out at the Yesod level, but I'm still not sure what the API would look like.

Member

snoyberg commented Jul 11, 2012

If we're talking about having an API that allows you to produce output to the user while consuming input, I think it has to sit at the WAI level. There's no way we could return a normal Yesod response after we've already started sending data to the client. Perhaps we can still provide something to help out at the Yesod level, but I'm still not sure what the API would look like.

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 11, 2012

I don’t think sending output back on the same connection while still consuming input is a useful thing to do. Snap does actually have an API for that (transformRequestBody), but it comes with a number of safety caveats, and based on my tests, neither Firefox nor Chrome accept more than a few hundred kilobytes of that output before sending all the input (presumably my TCP congestion window size or something).

But what is useful is the ability, while consuming input, to send output somewhere else, such as on a separate connection from the same user on which we want to send back progress data, or to another server on which we’re really going to store the file. My proposal allows that via IO side effects of the sink.

andersk commented Jul 11, 2012

I don’t think sending output back on the same connection while still consuming input is a useful thing to do. Snap does actually have an API for that (transformRequestBody), but it comes with a number of safety caveats, and based on my tests, neither Firefox nor Chrome accept more than a few hundred kilobytes of that output before sending all the input (presumably my TCP congestion window size or something).

But what is useful is the ability, while consuming input, to send output somewhere else, such as on a separate connection from the same user on which we want to send back progress data, or to another server on which we’re really going to store the file. My proposal allows that via IO side effects of the sink.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 11, 2012

Member

And why doesn't the FileUploadSource constructor fit the bill? You can perform any arbitrary IO action inside the Sink as well.

I have no problem adding an extra API for this feature, I just want to understand the motivation and exactly what it will look like.

Member

snoyberg commented Jul 11, 2012

And why doesn't the FileUploadSource constructor fit the bill? You can perform any arbitrary IO action inside the Sink as well.

I have no problem adding an extra API for this feature, I just want to understand the motivation and exactly what it will look like.

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 11, 2012

Since the POST handler has no opportunity to communicate with the fileUpload function until after the entire file is uploaded, the fileUpload function has no way to associate the incoming file with the request. So it can’t take different actions based on the file name, content type, which form and which control on the form the file was submitted to, or which user submitted it. That’s critical in my AJAX progress bar example.

Another example of something one might want to do with this is refuse to store uploads from unauthenticated users (even temporarily), to prevent denial of service attacks.

andersk commented Jul 11, 2012

Since the POST handler has no opportunity to communicate with the fileUpload function until after the entire file is uploaded, the fileUpload function has no way to associate the incoming file with the request. So it can’t take different actions based on the file name, content type, which form and which control on the form the file was submitted to, or which user submitted it. That’s critical in my AJAX progress bar example.

Another example of something one might want to do with this is refuse to store uploads from unauthenticated users (even temporarily), to prevent denial of service attacks.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 11, 2012

Member

OK, so it sounds to me like we could solve this with two changes:

  • Allow dynamically updating the fileUpload value from within a Handler.
  • Getting filename and content type is actually more involved: it requires a change to the code in wai-extra as well. But once we move it all the way down the stack, it should be trivial to make the filename and content type a parameter to FileUploadSource.

Assuming we had all that, would that solve the problem?

Member

snoyberg commented Jul 11, 2012

OK, so it sounds to me like we could solve this with two changes:

  • Allow dynamically updating the fileUpload value from within a Handler.
  • Getting filename and content type is actually more involved: it requires a change to the code in wai-extra as well. But once we move it all the way down the stack, it should be trivial to make the filename and content type a parameter to FileUploadSource.

Assuming we had all that, would that solve the problem?

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 11, 2012

Almost, but that still doesn’t let me do different things to files submitted in different controls to the same form. (Presumably that also falls into the category of things that need wai-extra changes.)

andersk commented Jul 11, 2012

Almost, but that still doesn’t let me do different things to files submitted in different controls to the same form. (Presumably that also falls into the category of things that need wai-extra changes.)

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 11, 2012

Member

Meaning that the Sink needs the filename, content type, and parameter name, right?

Member

snoyberg commented Jul 11, 2012

Meaning that the Sink needs the filename, content type, and parameter name, right?

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 11, 2012

Also, from an API cleanliness perspective (instead of a completeness perspective), I’m still not happy that FileUploadSource restricts me to returning things that can be transmitted via a Source (ResourceT IO) ByteString. I can of course return anything using an IORef and side effects, but that’s uglier.

My proposal above also lets me associate the upload action with the form control in a way that feels much more direct and safe than parsing the parameter name as a string, especially given that the name might be Yesod-generated.

andersk commented Jul 11, 2012

Also, from an API cleanliness perspective (instead of a completeness perspective), I’m still not happy that FileUploadSource restricts me to returning things that can be transmitted via a Source (ResourceT IO) ByteString. I can of course return anything using an IORef and side effects, but that’s uglier.

My proposal above also lets me associate the upload action with the form control in a way that feels much more direct and safe than parsing the parameter name as a string, especially given that the name might be Yesod-generated.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 11, 2012

Member

Your proposal is very vague. Can you try implementing it and sending a pull request? I don't really understand fully what you're getting at.

Member

snoyberg commented Jul 11, 2012

Your proposal is very vague. Can you try implementing it and sending a pull request? I don't really understand fully what you're getting at.

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 11, 2012

I’m additionally slightly unhappy that, even if I know my fileUpload handler returns a FileUploadSource, I need to pattern-match away the other two cases FileUploadMemory and FileUploadDisk as impossible.

andersk commented Jul 11, 2012

I’m additionally slightly unhappy that, even if I know my fileUpload handler returns a FileUploadSource, I need to pattern-match away the other two cases FileUploadMemory and FileUploadDisk as impossible.

@andersk

This comment has been minimized.

Show comment Hide comment
@andersk

andersk Jul 11, 2012

I’m willing to try implementing it. The one thing I’m not sure about is what to do with snd reqRequestBody, but I guess I’ll assume for now that I can just put [] there when the new mechanism is used.

andersk commented Jul 11, 2012

I’m willing to try implementing it. The one thing I’m not sure about is what to do with snd reqRequestBody, but I guess I’ll assume for now that I can just put [] there when the new mechanism is used.

snoyberg added a commit that referenced this issue Jul 11, 2012

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 11, 2012

Member

I've pushed changes to wai and yesod to provide the extra needed information to the Sinks. I believe that should be sufficient to implement this feature. If you need any other changes from wai, please let me know ASAP, as we're going to be making the next release soon (possibly in the next day).

Member

snoyberg commented Jul 11, 2012

I've pushed changes to wai and yesod to provide the extra needed information to the Sinks. I believe that should be sufficient to implement this feature. If you need any other changes from wai, please let me know ASAP, as we're going to be making the next release soon (possibly in the next day).

@lightraven24 lightraven24 referenced this issue in yesodweb/yesod-cookbook Sep 17, 2016

Open

File upload cookbook progress bar example #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment