upload using a ChunkReader #47

oliverpool · 2021-09-05T12:08:18Z

This PR prevents the uploaded file from being completely loaded in memory.
Instead it will be read chunk-by-chunk.

This can be particularly useful to upload files larger than the available RAM.

My implementation of a ChunkReader can probably be improved, thanks in advance for the feedback!

This prevents the uploaded file from being completely loaded in memory. Instead it will be read chunk-by-chunk.

Acconut · 2021-10-15T20:47:00Z

Thank you very much for this PR, it looks really useful! As I am not very proficient in Python (especially when it comes to async), I will try to find a person competent enough to do so.

oliverpool · 2022-09-14T07:31:43Z

@ifedapoolarewaju do you maybe have a feedback on this?

If yes, I can take the time to update the code to the 1.0.0 version.

Acconut · 2022-09-14T16:30:47Z

@oliverpool Would you mind having a look at the merge conflicts for your PR?

oliverpool · 2022-09-14T20:00:00Z

Sure, but only if someone is willing to take a look at the changes (I don't really want to invest time, if this PR is going to stay ignored for another year ;-)

ifedapoolarewaju

Thank you for the work. Dearly sorry for the delayed review 🙏🏽
I have left some suggestions. I'll be sure to be timely for any follow up reviews

ifedapoolarewaju · 2022-10-09T18:27:04Z

tusclient/request.py

        try:
            async with aiohttp.ClientSession(loop=self.io_loop) as session:
-                async with session.patch(self._url, data=chunk, headers=self._request_headers) as resp:
+                async with session.patch(self._url, data=self._chunk.reset().async_reader(8*1024), headers=self._request_headers) as resp:


could we move 8*1024 to a constant variable so it's re-used everywhere else as the internal chunk size

ifedapoolarewaju · 2022-10-09T18:51:46Z

tusclient/request.py

+            uploader.get_request_length(),
+        )
+
+    def add_checksum(self, file):


could we add types? I believe the type we need here might be io.BytesIO

ifedapoolarewaju · 2022-10-09T18:52:20Z

tusclient/request.py

+
+
+class ChunkReader(object):
+    def __init__(self, file, start, length):


could we add types for the params?

ifedapoolarewaju · 2022-10-09T19:13:29Z

tusclient/request.py

+    def read(self, size=-1):
+        if self.remaining is None:
+            raise Exception("reset() must be called before first read")
+
+        if size == -1 or size > self.remaining:
+            size = self.remaining
+        data = self.file.read(size)
+        self.remaining -= len(data)
+        return data


I think it'd be better if we separate the class that mimics the File stream behaviour from every other functionality. This is important so we can easily identify that read(...) specifically exists to implement stream readers.

So what I mean is, async_reader could just be a stand alone function async_reader(chunk_reader). And reset may not be needed if we just used ChunkReader directly wherever we need it. So for every place that the ChunkReader is needed, a new instance of it could be created. And then ChunkReader itself only contains the read method.

peterroelants · 2023-01-23T12:37:05Z

This PR prevents the uploaded file from being completely loaded in memory.
Instead it will be read chunk-by-chunk.

Can someone point me to where the file being uploaded is completely loaded into memory?

Having the file completely loaded into memory would be a blocker for the current use case that I'm working on, and using tus-py-client for. However, from what I understand from the code the file is already read in chunks?

peterroelants · 2023-01-23T12:37:55Z

tusclient/request.py

        try:
-            chunk = self.file.read(self._content_length)


Doesn't this mean that he file is already read in chunks?

Yes, each tus-chunk will be completely loaded in memory (I shouldn’t have used the word “chunk” in my initial comment).

Still if your chunk is 100 MB big, it is suboptimal to load those 100MB in memory…

I see, it's only the chunks that are fully read (and which this PR tries to avoid). Thanks for clarifying.

Acconut · 2023-03-26T22:24:41Z

@oliverpool Did you see the feedback from @ifedapoolarewaju last year? Are you still interested in pushing this PR forward? From April onwards, we will have more capacity to review any changes to this PR.

upload using a ChunkReader

913bb7d

This prevents the uploaded file from being completely loaded in memory. Instead it will be read chunk-by-chunk.

ifedapoolarewaju reviewed Oct 9, 2022

View reviewed changes

peterroelants reviewed Jan 23, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upload using a ChunkReader #47

upload using a ChunkReader #47

oliverpool commented Sep 5, 2021

Acconut commented Oct 15, 2021

oliverpool commented Sep 14, 2022

Acconut commented Sep 14, 2022

oliverpool commented Sep 14, 2022

ifedapoolarewaju left a comment

ifedapoolarewaju Oct 9, 2022

ifedapoolarewaju Oct 9, 2022

ifedapoolarewaju Oct 9, 2022

ifedapoolarewaju Oct 9, 2022

peterroelants commented Jan 23, 2023

peterroelants Jan 23, 2023

oliverpool Jan 23, 2023

peterroelants Jan 24, 2023

Acconut commented Mar 26, 2023



		class ChunkReader(object):
		def __init__(self, file, start, length):

upload using a ChunkReader #47

Are you sure you want to change the base?

upload using a ChunkReader #47

Conversation

oliverpool commented Sep 5, 2021

Acconut commented Oct 15, 2021

oliverpool commented Sep 14, 2022

Acconut commented Sep 14, 2022

oliverpool commented Sep 14, 2022

ifedapoolarewaju left a comment

Choose a reason for hiding this comment

ifedapoolarewaju Oct 9, 2022

Choose a reason for hiding this comment

ifedapoolarewaju Oct 9, 2022

Choose a reason for hiding this comment

ifedapoolarewaju Oct 9, 2022

Choose a reason for hiding this comment

ifedapoolarewaju Oct 9, 2022

Choose a reason for hiding this comment

peterroelants commented Jan 23, 2023

peterroelants Jan 23, 2023

Choose a reason for hiding this comment

oliverpool Jan 23, 2023

Choose a reason for hiding this comment

peterroelants Jan 24, 2023

Choose a reason for hiding this comment

Acconut commented Mar 26, 2023