Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement storage in browser clients #86

Open
feross opened this issue Sep 12, 2014 · 21 comments
Open

Implement storage in browser clients #86

feross opened this issue Sep 12, 2014 · 21 comments

Comments

@feross
Copy link
Member

@feross feross commented Sep 12, 2014

Currently, when webtorrent is used in the browser, it stores files entirely in memory. We need a solution that uses disk whenever possible, to prevent excessive memory usage and tab crashes.

Possible approaches:

Are there other ways to do browser storage? Are there other approaches to consider? Are there better storage modules to use?

Feedback welcome!

@Ayms

This comment has been minimized.

Copy link

@Ayms Ayms commented Sep 12, 2014

Feedback from Peersm: for now indexedDB is enough and probably the best approach (not really the choice...), even with large files, with some drawbacks/bugs as I wrote in #39 (and using something on top of it like filer can make things easier), assuming you store the files by chunks (Blobs), or use chunks as an intermediary step (please see this thread http://lists.w3.org/Archives/Public/public-webapps/2013OctDec/0657.html)

Unfortunately handling partial data as described in http://lists.w3.org/Archives/Public/public-webapps/2014AprJun/0171.html and http://lists.w3.org/Archives/Public/public-webapps/2014JulSep/0332.html seems not to be for tomorrow.

@yciabaud

This comment has been minimized.

Copy link
Contributor

@yciabaud yciabaud commented Oct 11, 2015

I am doing some experiments to persist data in the browser and reduce memory consumption.
I don't know if there is a better option but I wrote a chunk store based on localForage. In case someone is interested : https://github.com/yciabaud/localforage-chunk-store

It seems to work basically, I will have a look at the performance/stability in the next days.

@santiagogil

This comment has been minimized.

Copy link

@santiagogil santiagogil commented Oct 19, 2015

Managed to get fs-chunk-store working in the browser over browserify-fs instead of fs.

I had to modify:

  • node-mkdirp (remove sync methods and make it use browserify-fs instead of fs).
  • rimraf (remove sync methods and make it use browserify-fs instead of fs).
  • random-access-file (make it use browserify-fs instead of fs).
  • fs-chunk-store itself (avoid using os in browser).

The only remaining problem is the fact that fs-chunk-store needs the temp path to exist in order to run.
Right now if you run my version in the browser it fails until the page is reloaded and the temp path gets created.

Some questions:

  • Which would be the best way to solve the temp path problem?
  • Is it preferable to make PRs to each of the dependencies making them browserifyables or just fork them in order to get this working as soon as possible?
  • Is the chunk store the only change that needs to be done?

There are some other fs replacements for the browser (some with sync methods included) but those look too big and complex.

@santiagogil

This comment has been minimized.

Copy link

@santiagogil santiagogil commented Oct 19, 2015

Got fs-chunk-store working without reloading.

Even with this, it's still impossible to build webtorrent.
It seems that something I can't find yet it's trying to open a directory that doesn't exist.

Going to work on it tomorrow.
Still think that it's viable to solve the browser storage problem using a leveldb version of fs-chunk-store.

@Cheedoong

This comment has been minimized.

Copy link

@Cheedoong Cheedoong commented Nov 5, 2015

@feross @Ayms @yciabaud @santiagogil
In May 2014 Stone (http://code.csdn.net/news/2819989) from Tencent and I discussed this with PLH (http://www.w3.org/People/LeHegaret/) from W3C and he strongly suggested using IndexedDB in this scenario.

@yciabaud

This comment has been minimized.

Copy link
Contributor

@yciabaud yciabaud commented Nov 5, 2015

My storage implementation uses indexeddb when available and seems to work pretty well. Maybe I should send a PR to replace memory-chunk-store in the browser.

@hualiwang

This comment has been minimized.

Copy link

@hualiwang hualiwang commented Dec 15, 2015

What's the news,?It still take much memory in browser.

@santiagogil

This comment has been minimized.

Copy link

@santiagogil santiagogil commented Dec 16, 2015

I'm doing a lot of research on this last few days.
It seems that one of the main problems is that immediate-chunk-store is preety agressive on memory and garbage collector.
Identified issues:

  • this.mem it's an array that constantly changes it's size forcing the VM to do a lot of memory reallocation. This could be improved by preallocating the array `this.mem = new Array(). Still needs to be extensively perf tested.
  • It creates lots of buffer objects that are allmost inmediately dereferenced becoming candidates to be garbage collected. I think that we can do better here by using an object pool to preallocate a prudent number of this objects that could be recycled once the store ends putting the chunks.

Some background:

Sorry about to much commenting with no PR. I'm having time on phone to read and think but no keyboard time :(

@santiagogil

This comment has been minimized.

Copy link

@santiagogil santiagogil commented Dec 16, 2015

this.mem = new Array(this.storage.size % this.storage.chunkSize + 1) *

@rom1504

This comment has been minimized.

Copy link
Member

@rom1504 rom1504 commented Feb 27, 2016

20:41 < pmpp> you may add a link to https://github.com/jvilk/BrowserFS for reference it is quite useable

It might be a good idea. Need to check how performant that would be though.

@jimmywarting

This comment has been minimized.

Copy link
Contributor

@jimmywarting jimmywarting commented Jun 9, 2016

I'm not sure but could my lib be of any interest? StreamSaver.js Writes data to the hard drive directly (asyncronus) Not possible to seek, Uses service worker to fake a response and dosn't use any local storage - so you wouldn't need to worry about "20% of user's available disk space" since its writing directly to the hard drive

I was able to write 15gb of data generated on the client side without any prompts (except for choosing where you want to save)

Did a hack so it works for HTTP sites also

@xuset

This comment has been minimized.

Copy link

@xuset xuset commented Oct 25, 2016

I would assume that if the user is downloading a torrent in the browser, quits the browser, then starts the download again. The download should resume where it left off, right? If we have some persistent storage in the browser, what would the implications of this be in regards to running multiple webtorrent instances that share a common persistent store? By having a persistent store, there are weird conditions where multiple instances are acting on the same store; for example, what if a user has multiple tabs open to the same webpage that has the same torrent downloading in each.

I think making a indexdb-chunk-store wouldn't be too hard, but I don't know how this would work where there are multiple webtorrent instances using the same underlying storage. Does anyone know how these cases are handled in the desktop version or how they should be handled in the web version?

@feross feross added accepted and removed accepted labels May 3, 2018
@jimmywarting

This comment has been minimized.

Copy link
Contributor

@jimmywarting jimmywarting commented Jul 27, 2018

I have written both a idb storage and a filesystem storage.
What I have learnt from building FileSaver and StreamSaver was that blob & files are more memory friendlier then buffers.

Sure blobs can be made out of just memory but there is a way to move the blob off from the memory and on to the HDD by writing it do some web storage and replace it with a blob that is just a pointer to somewhere in the HDD

A blob isn't made of data... look at this simplified chart how chrome describes it in a document


skarmavbild 2018-07-27 kl 16 35 53


// So if you have one 100MB blob in memory
let blob = new Blob([new ArrayBuffer(1.049e+8)])

// then you write this to the idb or the sandboxed filesystem
await storeBlob(blob)

// after that you replace the `blob` variable with the one you get from the storage
blob = await getBlob(index)

The result is that you now have replaced the memory blob with the same data. The only differensen now is that the blob is a pointer to a place in the HDD.

So what i'm proposing is that all abstract-chunk-store can (if it wants to) return a blob instead of a buffer

get(index, opts, cb) {
  cb(null, blob)
}

This will result in faster startup time when webtorrent checks if all pieces exist. The indexeddb layer won't read the hole buffer, instead just give back a blob with a pointer resulting in faster lookups.
This is also necessary if you want to assemble the final Blob when you download something large

so instead of the (default: 200 * 1000 * 1000 bytes) maxBlobLength

Where you probably do something like this

blob = new Blob([buffer_1, buffer_2, buffer_x])

you could loosen up the maxBlobLength by A LOT if you did it with blobs instead of buffers

blob = new Blob([blob_1, blob_2, blob_x])

since the second approach just combines blob pointers w/ offsets & size
It's tested and confirmed. I was able to create much larger blobs then

I hope this is a easy fix since blob and buffer works in similar ways that it can be assembled to a final blob, and blob as well as buffers can also use .slice(start, end)

@KayleePop

This comment has been minimized.

Copy link
Contributor

@KayleePop KayleePop commented Aug 11, 2018

I implemented this in a branch of idbkv-chunk-store if anyone wants to experiment. (check the readme for usage)

https://github.com/KayleePop/idbkv-chunk-store/tree/blobs

It looks like you're right about indexedDB returning a pointer.

@jimmywarting

This comment has been minimized.

Copy link
Contributor

@jimmywarting jimmywarting commented Aug 12, 2018

@KayleePop it don't matter much if you store it as blob or buffer if you must convert it to buffer afterwards. Webtorrent need to be able to accept a blob chunks to be able to assemble them all to one large final blob

@KayleePop

This comment has been minimized.

Copy link
Contributor

@KayleePop KayleePop commented Aug 12, 2018

you pass an option to get() to return it as a blob.

chunkStore.get(0, {returnBlob: true}, (err, blob) => console.log(blob instanceof Blob)) // outputs true

It also means that if you get a partial chunk like this

chunkStore.get(0, {length: 20, offset: 100}, (err, buffer) => console.log(buffer))

It will only read the actual data that's returned, because the slice happens on the blob before it's converted into a buffer.

 

If it returned blobs by default, then it would break the abstract chunk store tests.

@jimmywarting

This comment has been minimized.

Copy link
Contributor

@jimmywarting jimmywarting commented Sep 7, 2018

I think it's unfortunately that there isn't any storage.prototype.getBlob() or storage.prototype. getBlobURL() utility. If one are using the sandboxed filesystem storage in Blink (opera & chrome) it would just be better to just return the File instead or the filesystem link (entry.toUrl())

What is a real bottleneck is:

  1. Slicing the file
  2. Pump it throught a stream
  3. Concat the chunks back to a blob
  4. Create a blob url.
  5. return the blob or append it to the DOM or Download it

If using the indexedDB layer it would just be better to open up a cursor and concatinate all blob in one go. And then return the blob or ObjectURL

This would be far more affective if it was used instead.
If the torrent file is done it would be better to call this two function instead of streaming it

@Ayms

This comment has been minimized.

Copy link

@Ayms Ayms commented Sep 7, 2018

See comment above #86 (comment)

Slicing the file
Pump it throught a stream
Concat the chunks back to a blob
Create a blob url.
return the blob or append it to the DOM or Download it

A bit amazing that this is still an issue for webtorrent...

This is what the Peersm project is doing since years (with additional complexity of encryption, hash, video conversion, etc), you can try it, slicing in chunks, storing the same way, streaming the chunks, reconstituting the blob, etc, all this with flow control

The Peersm/node-Tor code will become open source as soon as someone funds this work

@wbulot

This comment has been minimized.

Copy link

@wbulot wbulot commented Oct 31, 2019

What about this problem? The discussion is 5 years old and it seems that no further discussion on it has occurred for more than 1 year.

Is there anything that prevents the implementation of a storage system? Has anyone managed to implement this for webtorrent?

If I understand correctly, in the current state it is not possible to download a torrent larger than the available memory. For a tool that is supposed to be the torrent for the browser, this seems to be a priority.

@Ayms

This comment has been minimized.

Copy link

@Ayms Ayms commented Oct 31, 2019

See #1767

This project is storing files/torrents in chunks using indexedDB since years (you can try http://peersm.com/peersm2), the code is now open source in clear so you might use it/get some inspiration from it to implement webtorrent browser storage

@ftreesmilo

This comment has been minimized.

Copy link

@ftreesmilo ftreesmilo commented Dec 6, 2019

I'm using this with my own store. https://web.dev/native-file-system/
However, you don't want to have to use it as a chunkstore to play media... you'll want the underlying file for that. The current webtorrent API makes it really hard to remove that js layer of file read/write

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.