Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't update FileReader's result until "LOAD" #79

Closed
annevk opened this issue May 4, 2017 · 15 comments
Closed

Don't update FileReader's result until "LOAD" #79

annevk opened this issue May 4, 2017 · 15 comments

Comments

@annevk
Copy link
Member

annevk commented May 4, 2017

Otherwise you get a giant memory leak. Reportedly Firefox already does this. Chrome allocates strings/buffers and will quickly get slow.

@jakearchibald
Copy link

jakearchibald commented May 4, 2017

@jakearchibald
Copy link

We could provide incremental updates for arraybuffers by allocating new ArrayBuffer(file.size) as soon as readAsArrayBuffer is called, then update the buffer for each progress event. This would avoid the performance issues Chrome is hitting.

Unfortunately it means that reader.result.byteLength will not reflect progress, which may be considered a backwards incompatible change.

@jakearchibald
Copy link

However, XHR doesn't offer an arraybuffer until the request is complete, so aligning with that might be better. Then, create a streams-based API to do it properly.

Again, this may be backwards incompatible.

@mkruisselbrink
Copy link
Collaborator

@jakearchibald that chrome issue was actually for a completely different bug (until you added comments about this issue to it). I do agree that it might make sense to get rid of result during progress events if that is what other browsers are already doing anyway. As you mention, should be relatively easy to add a use-counter for.

@mkruisselbrink
Copy link
Collaborator

Actually, the spec already sort of implies that result shouldn't be available until LOAD time. All the various read method algorithms don't assign any value to result until the "process read EOF" steps. It's probably still a good idea to write some tests to figure out what browsers are actually doing here though (for example does result keep the result of a previous read operation until a new load has finished, or is it reset back to null when the new read starts?)

And I added the use counter to chrome to see how often this (non-spec-compliant) behavior of returning result before load is finished is actually triggered.

@mkruisselbrink
Copy link
Collaborator

@mkruisselbrink
Copy link
Collaborator

And some tests to figure out the current behavior in web-platform-tests/wpt#7494

In summary, currently Firefox seems to be the odd one out, with Chrome, Safari and Edge all having result available during progress events (at least during the last progress event; it's hard to come up with a cross-browser test that triggers multiple progress events). Firefox on the other hand seems to more closely follow the spec, with two exceptions: 1) when starting a new read it resets result to null, and 2) for readAsBinaryString Firefox actually does match the other three browsers and the result is available during the progress event. It's only for the other three read methods were it doesn't.

So maybe changing the spec to match Chrome/Safari/Edge rather than trying to change almost all the implementations to match the spec would make more sense...

(Edge does actually have slightly odd behavior: reader.result actually already contains the full result even during the loadstart event...)

MXEBot pushed a commit to mirror/chromium that referenced this issue Sep 27, 2017
We might want to stop supporting this, so see how common the behavior
is.

Bug: w3c/FileAPI#79
Change-Id: Ia01150c0288cad4efc155a780693077f4413c019
Reviewed-on: https://chromium-review.googlesource.com/683334
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: Marijn Kruisselbrink <mek@chromium.org>
Cr-Commit-Position: refs/heads/master@{#504426}
@annevk
Copy link
Member Author

annevk commented Sep 27, 2017

How does https://w3c.github.io/FileAPI/#filedata-attr not return anything before LOAD? That doesn't match up with the requirements.

Also, what about the memory leak? I hope you don't want to enshrine that.

@mkruisselbrink
Copy link
Collaborator

It's definitely ambiguous in the current spec what result should return after calling a method but before the algorithm in the method sets result to something, so no matter what solution we pick the spec should be clarified.

I'm not sure what the supposed memory leak is though?

@annevk
Copy link
Member Author

annevk commented Sep 28, 2017

Allocating new strings or ArrayBuffer objects while loading, each slightly larger than the last.

@mkruisselbrink
Copy link
Collaborator

Well yeah, that could be bad, but it still wouldn't be any kind of leak unless the javascript code explicitly holds on to the created objects. And even then it's just bad javascript code, not a leak... Also (at least chrome) won't actually allocate these partial strings/arraybuffers unless javascript actually tries to access the result attribute (although I guess that might be observable somehow?)

But yeah, it definitely seems of limited usefulness to be able to inspect result before reading is complete.

@annevk
Copy link
Member Author

annevk commented Sep 28, 2017

Even if you don't hold onto the created objects you still end up allocating lots of ever increasing objects only to let them GC again shortly after. It's a bad design.

@mkruisselbrink
Copy link
Collaborator

Yeah, there definitely is little benefit in having result be usable during loading. So if it wasn't for the fact that most browsers/methods currently do so I would have no problem changing the spec and chrome implementation (well, pending outcome of the use counter I added). I'm just not sure if it's worth changing this given that most browsers actually agree on a behavior.

nareix pushed a commit to nareix/webrtc.third_party that referenced this issue Mar 5, 2018
We might want to stop supporting this, so see how common the behavior
is.

Bug: w3c/FileAPI#79
Change-Id: Ia01150c0288cad4efc155a780693077f4413c019
Reviewed-on: https://chromium-review.googlesource.com/683334
Reviewed-by: Kinuko Yasuda <kinuko@chromium.org>
Commit-Queue: Marijn Kruisselbrink <mek@chromium.org>
Cr-Original-Commit-Position: refs/heads/master@{#504426}
Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
Cr-Mirrored-Commit: f109b62570d9b41e82063a98a5e61c49c24d8494
@mkruisselbrink
Copy link
Collaborator

(did some more digging into this), the chrome implementation at least seems to match the spec as it was when it was implemented (in 2010) until around 2013. https://www.w3.org/TR/2012/WD-FileAPI-20121025/#filedata-attr still explicitly says " the result attribute should return partial Blob data " for readAsText and readAsArrayBuffer (but not for readAsDataURL), which exactly matches chromes behavior. The next version of the spec no longer has that language and instead queues tasks at load complete to update the result attributes once and only once.

Also this subject was brought up in http://lists.w3.org/Archives/Public/public-webapps/2013JanMar/0289.html (as point 4), but it doesn't seem like anybody ever replied to that point.

Ah, later that year another thread was started specifically about this subject, http://lists.w3.org/Archives/Public/public-webapps/2013JulSep/0357.html, which doesn't seem to have gotten very far either, but it seems like the spec was updated to reflect that change anyway.

So yeah, it would make sense to make the spec a bit more explicit here, even though it already sort of implies that only final result should be available. Fortunately it doesn't seem like reading partial results is used a lot (https://www.chromestatus.com/metrics/feature/timeline/popularity/2158), so seems like this should be fairly safe to change.

@jakearchibald
Copy link

@mkruisselbrink good investigative work sir!

mkruisselbrink added a commit that referenced this issue Mar 12, 2019
Make algorithms more imperative to make it clearer what is going on.
No intended behavioral changes, but making it clearer that the behavior
from #79 is the expected behavior.
mkruisselbrink added a commit that referenced this issue Mar 15, 2019
* Rewrite FileReader definitions.

Make algorithms more imperative to make it clearer what is going on.
No intended behavioral changes, but making it clearer that the behavior
from #79 is the expected behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants