Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save a complete Torrent with its Files for later reuse! #1293

Closed
Weedshaker opened this issue Feb 14, 2018 · 17 comments
Closed

Save a complete Torrent with its Files for later reuse! #1293

Weedshaker opened this issue Feb 14, 2018 · 17 comments

Comments

@Weedshaker
Copy link

@Weedshaker Weedshaker commented Feb 14, 2018

What version of WebTorrent?
0.98.19
What operating system and Node.js version?
Ubuntu 16.04/v9.5.0
What browser and version? (if using WebTorrent in the browser)
Chromium 64.0.3282.140
What did you expect to happen?
saving a torrent (including all its files "blobs") and recreate it.
What actually happened?
not possible?

Hi,
I have been working with webtorrent for quite a while and it works great. But now I am at a point where I need to save a torrent, not just a torrent file, but the complete torrent as it is in client.torrents[0] with all its belonging files/blobs to a variable, resp. a file and be able to load it later. Like a snapshot!
I tried to do it with getBuffer and save the Uint8Array. Although, stuff gets mixed up as well as it's not possible to recreate the whole FileList but only single files.

This is what I tried: new File(Uint8Array, "name.jpeg", {type: "image/jpeg"}) but even this, would not properly recreate the same magnetURI when seeding...

Could you please point me into a direction, how I could save a torrent object for later use!?
(since I dont want to bug the user to re-input all the files from the last session)

Thank you very much in advance.

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Feb 16, 2018

I don't have time to give you exact code for how to accomplish this, but I'll say:

  • You can't create File objects in the browser, as far as I'm aware.

  • I'd look into idb-chunk-store. If you use that as your backing store when creating the torrent (store option), then you'll be able to get the file data out again from there, easily. By default, WebTorrent uses an in-memory store.

  • When you go to re-add the torrent, use torrent.add() and pass a store instance of idb-chunk-store that points to the same data in IndexedDB and use the same torrentId (magnet link, etc.) and it should work.

@feross feross closed this Feb 16, 2018
@Weedshaker

This comment has been minimized.

Copy link
Author

@Weedshaker Weedshaker commented Feb 17, 2018

okay, I will let you know how this works out. And I am going to post some code snippet here, if successful. Anyways, thank you very much for your kind answer!

@Fenny

This comment has been minimized.

Copy link

@Fenny Fenny commented Feb 18, 2018

@Weedshaker, any updates?

@SilentBot1

This comment has been minimized.

Copy link
Member

@SilentBot1 SilentBot1 commented Feb 19, 2018

Hey Jad3z,

I was interested in persistent storage between sessions myself and had a play around with what feross mentioned.
Example:

//npm i idb-chunk-store webtorrent
var idb = require('idb-chunk-store')
var WebTorrent = require('webtorrent')
var client = new WebTorrent()

//Sintel MagnetURI taken from https://webtorrent.io/
var sintel = "magnet:?xt=urn:btih:08ada5a7a6183aae1e09d831df6748d566095a10&dn=Sintel&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.empire-js.us%3A1337&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.fastcast.nz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&ws=https%3A%2F%2Fwebtorrent.io%2Ftorrents%2F"

function addTorrent(info){

  var torrent = client.add(info, {"store": idb})

  torrent.on('metadata', ()=>{
    console.log(`[${torrent.infoHash}] Recieved metadata!`)
  })
  
  torrent.on('ready', ()=>{
    console.log(`[${torrent.infoHash}] Store ready!`)
    torrent.files.forEach((file)=>{
      file.appendTo('body')
    })
  })
  
  torrent.on('done', ()=>{
    console.log(`[${torrent.infoHash}] Downloaded!`)
  })
  
  return torrent
  
}

//Exposes client object to global namespace.
window.client = client;
//Adds addTorrent to global namespace to allow torrents for testing!
window.addTorrent = (...args)=>{addTorrent(...args);}
//Adds demo function to global namespace to run test using sintel MagnetURI.
window.demo = ()=>{addTorrent(sintel);}

The above code once put through browserify and sourced in a blank html page and invoked with demo() in the development console, within Chrome 63, will save the downloaded torrent (sintel) into a Indexed DB chunk store and will persist between browser restarts and will also load back up from the Indexed DB if the torrent is added back to the client at a later date assuming it hasn't been deleted by the browsers Quota Manager.

The only limitation I have encountered so far is the Indexed DB Chunk Store uses the same database name ("chunksDB") and table ("chunks") for the data store, meaning that each new torrent added will overwrite the previous torrent (useful if you don't need multiple torrents and don't want to hog disk space). This could be avoided by changing storeOpts.name to the torrents infohash here as idb-chunk-store uses either storeOpts.name or "chunksDB" or this could be remedied by idb-chunk-store checking for opts.torrent.infoHash here but this seems out of scope for a chunk store.

Hope you found this useful!

@Weedshaker

This comment has been minimized.

Copy link
Author

@Weedshaker Weedshaker commented Feb 19, 2018

@SilentBot1 Thank you! This has been very helpful and in such a detail that I was able to easily put it into my program. Although, I found that IndexedDB has quite bad performance. I could barely not tell the difference on my i7 core 8sec VS. a few millisec's, though. But luckily I have some crappy Intel Celeron to test for lowend devices and this was the result:

Seeding a 23.5MB file with idb-chunk-store => 3min10sec VS. 6sec when using default in memory storage.

I am gonna try plan B, JSON.stringify but I first got to implement some library solving the "Converting circular structure" issue. Then I will get back here asap.

Cheers

@SilentBot1

This comment has been minimized.

Copy link
Member

@SilentBot1 SilentBot1 commented Feb 20, 2018

Hi @Weedshaker,

In my own testing, I verified your claim that the IndexedDB store is indeed slower than the Memory store, while this should be assumed due to the speed difference between memory and disk there is also the added overhead caused by using IndexedDB instead of having direct disk access, but this isn't something we can avoid using the currently available chunk stores.

Using a 90MB test file, seeded on Instant.io in a separate browser, the download difference between the memory store and the IndexedDB store was roughly between 18 to 20 seconds, which is a little over 300% of the time taken for the memory store. This can be seen below:

Testing

The speed of the IndexedDB store will be likely limited by disk read write speeds and CPU power for hashing while the memory store will be limited by memory capacities and CPU power for hashing. This will need to be taken into account when choosing which chunk store you wish to use and if it meets the requirements for your application; hopefully you find a solution which works for you and let us know your findings if you make any progress.

All the best.

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Feb 20, 2018

Thanks for sharing these findings, super interesting. I wonder if there are any tricks to speed up IndexedDB, or perhaps change the way we're using it here to make it faster?

@Weedshaker

This comment has been minimized.

Copy link
Author

@Weedshaker Weedshaker commented Feb 20, 2018

The sledgehammer approach works->
I saved the whole client object by using this library https://github.com/valery-barysok/json-cycle and then parse it back in after session restart. All data persists!

Although, @feross I would have preferred to simply grab the in memory chunk store and later be able to re-seed it. torrents[0].store.store.chunks -> save it -> (re-)seed[chunks]. But as I already mentioned it, this gives me the same file but a different magnetURI, cause some stuff like file.name, etc. is obviously missing.

@payflix

This comment has been minimized.

Copy link

@payflix payflix commented Feb 21, 2018

@SilentBot1 Thanks for your code snippet!

@SilentBot1

This comment has been minimized.

Copy link
Member

@SilentBot1 SilentBot1 commented Feb 24, 2018

@feross, @Weedshaker,

After looking at performance breakdowns of idb-chunk-store, testing and a little further digging into IndexedDB; I was able to get near identical performance from idb-chunk-store when compared to the memory-chunk-store.

The idb-chunk-store package uses JSON.stringify on the chunk (buffer in our case) before storing and JSON.parse on the stored chunk (again, buffer) when loading the data from the IndexedDB. As IndexedDB uses the structured cloning algorithm, it can naively store all JavaScript variable types without needing to be put through the stringify / parse process. After finding this out I removed both the stringify and parse from the IndexedDB interactions and was able to store the chunk directly in the store.

After making these changes I was able to achieve identical speeds for downloading and storing when compared to memory-chunk-store, while within a small margin of error, this can be seen below:

Image showing results

b240ab97e2b8cece... was a 256 MB file. Using this and the time taken to download, both files transferred at ~15 MB/s from my server and was CPU bound on the seeding node

I will be submitting a pull request to the idb-chunk-store, tomorrow hopefully, to add these changes so everybody using the idb-chunk-store can receive the same results and improvements.

Hope everybody has found this useful!

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Feb 24, 2018

@SilentBot1 Nice findings!

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Feb 28, 2018

@SilentBot1 Nice work!

@Weedshaker

This comment has been minimized.

Copy link
Author

@Weedshaker Weedshaker commented Feb 28, 2018

@SilentBot1 , removing JSON.stringify certainly improved idb performance. Although, it hasn't been satisfying with large files. So, I attempted to make a hybrid store. Its just a draft but I tested it and it works:

import IndexeddbChunkStore from 'xuset/indexeddb-chunk-store/idbchunkstore.min.js';

export class HybridChunkStore {
	constructor(chunkLength, opts = {}, mdb, torrent) {
		this.chunkLength = chunkLength;
		this.length = opts.length;

		this.getCallStack = []; // catches get request before store is ready
		this.idleTime = 60000; // the store is idle after ...ms and will call through the idleCallStack
		this.idleTimeCont = null;
		this.idleCallStack = new Map();
		torrent.on('done', () => {
			this.idleCallStack.set('firstIdle',
				new Map([
					['function', function (name) {
						this.destroy(undefined, name);
						this.idleCallStack.delete('firstIdle'); // remove this from idleCallStack - only call once
					}],
					['scope', this],
					['attributes', ['mdb']],
				])
			);
			this.setIdleTimeout();
		});
		this.databaseExists(opts.name, (exists) => {
			this.dbs = new Map([ // !!! Keep this order !!!
				['mdb', mdb], // default in-memory-store
				['idb', new IndexeddbChunkStore(chunkLength, opts)] // indexedDB hd-store
			]);
			if (exists) this.destroy(undefined, 'mdb'); // idb already exists -> mdb not needed
			this.get = this._get;
			this.getCallStack.forEach(args => {
				this.get(...args);
			});
		});
	}
	put(index, buf, cb){
		let i = 0;
		this.dbs.forEach(db => {
			db.put(index, buf, i !== 0 ? null : cb);
			i++;
		});
		this.setIdleTimeout();
	}
	// temp function, gets replaces by _get as soon as the store is ready
	get(index, opts, cb){
		this.getCallStack.push([index, opts, cb]);
	}
	_get(index, opts, cb){
		this.dbs.values().next().value.get(index, opts, cb); // get first db in dbs, this should be mdb, which is faster
		this.setIdleTimeout();
	}
	close(cb){
		this.dbs.forEach(db => db.close(cb));
	}
	destroy(cb, name = false) {
		if (name) return this._destroy(cb, this.dbs.get(name), name);
		this.dbs.forEach(db => this._destroy(cb, db, name));
	}
	_destroy(cb, db, name) {
		if (!db) return false;
		db.close(() => {
			db.destroy(cb);
			if (name) this.dbs.delete(name);
		});
	}
	idle(){
		this.idleCallStack.forEach(stack => {
			stack.get('function').apply(stack.get('scope'), stack.get('attributes'));
		});
	}
	setIdleTimeout(){
		clearTimeout(this.idleTimeCont);
		this.idleTimeCont = setTimeout(() => {
			this.idle();
		}, this.idleTime);
	}
	get closed() {
		return this.dbs.values().next().value.closed;
	}
	databaseExists(dbname, cb){
		const req = window.indexedDB.open(dbname);
		let exists = true;
		req.onsuccess = () => {
			req.result.close();
			if (!exists) window.indexedDB.deleteDatabase(dbname);
			cb(exists);
		}
		req.onupgradeneeded = () => {
			exists = false;
		}
	}
}

export const setupHybridChunkStore = (torrent) => {
	Object.defineProperty(torrent, 'store', {
                get: () => {return torrent.sst_store;},
		set: (ImmediateChunkStore) => {
			if(ImmediateChunkStore){
				torrent.sst_store = ImmediateChunkStore;
				ImmediateChunkStore.store = new HybridChunkStore(ImmediateChunkStore.chunkLength, {name: torrent.infoHash, length: ImmediateChunkStore.store.length}, ImmediateChunkStore.store, torrent);
			}
		}
	});
}
  1. puts the data into both stores, idb & mdb (default), except the idb already exists, then it removes mdb
  2. once done, it will remove the mdb and free ram
  • seed/add torrents as quick as mdb (default)
  • falls back to idb when the data already exists, so that the user doesn't have to download the files twice on a revisit
    => Hope this harnesses the advantage of both type of stores.

@feross , Thank you for pointing me this direction... Although, this has only been half the answer, the other part of the question is ->
How could I resurrect a torrent out of my chunks, stored in my indexeddb and start seeding with identical magnetURI???

@SilentBot1

This comment has been minimized.

Copy link
Member

@SilentBot1 SilentBot1 commented Feb 28, 2018

Hey @Weedshaker,

Isn't this memory caching functionality already included with the use of immediate-chunk-store in the webtorrent library by default?

I'll have a look into resurrection of a torrent from the chunk store when I get a moment, but you'll also need the metadata beforehand, meaning it's similar to the torrent.add method I mentioned above as it will verify the pieces already stored in the idb store, then begin seeding.

All the best.

@Weedshaker

This comment has been minimized.

Copy link
Author

@Weedshaker Weedshaker commented Feb 28, 2018

Hi @SilentBot1 ,

yes, as you can see in the code above, mdb === the default immediate-chunk-store. So, this hybrid stuff is using both until idb is finished...

I have been experimenting with the chunks and the add/seed functions... also I was assuming some metadata is missing, guess stuff which also gets passed here:

self.store = new ImmediateChunkStore(
    new self._store(self.pieceLength, {
      torrent: {
        infoHash: self.infoHash
      },
      files: self.files.map(function (file) {
        return {
          path: path.join(self.path, file.path),
          length: file.length,
          offset: file.offset
        }
      }),
      length: self.length
    })
)

But I wasn't able to recreate those metadata's, resulting in different magnetURI's.

The client.seed can be fed with Uint8Array Buffer but actual FileLists can't be recreated, since the web api, doesn't just let you fake FileLists for some security reason. I then tried to freeze torrents or the whole client into stringifying them. But I couldn't get it back to life.

The client.add function doesn't check the storage, if the data is already there until it gets actually an active peer and starts downloading.

I guess I have to dig deeper... LOL

@SilentBot1

This comment has been minimized.

Copy link
Member

@SilentBot1 SilentBot1 commented Mar 1, 2018

Hey @Weedshaker,

I have been experimenting with the chunks and the add/seed functions... also I was assuming some metadata is missing, guess stuff which also gets passed here

The information you are refering to is the options object passed to self._store, this will be passed through to the store set in opts.store.

But I wasn't able to recreate those metadata's, resulting in different magnetURI's.

Without having the files object which states the chunk length and the offset from the start of the store, it is impossible to re-create the torrent from only the chunk store as the chunk store only stores chunks, not how the chunks are arranged. (This would be possible with single file torrents as you can just create a new File using all chunks in the store).

The client.seed can be fed with Uint8Array Buffer but actual FileLists can't be recreated, since the web api, doesn't just let you fake FileLists for some security reason.

The WebTorrent library doesn't use the FileList, it instantly converts the FileList into an array of files here. These files can be created by using the file interface here, though again, this requires the metadata as the length and index of each file must be known before pulling it from the chunk store and creating the file.

The client.add function doesn't check the storage, if the data is already there until it gets actually an active peer and starts downloading.

client.add does check the storage, but only after the metadata has been received, as how can we verify the pieces if we don't know the hashes of them?

I wrote a terrible, yet functional, demo application showing how I was able to successfully add a torrent to the client, destroy the torrent, navigate away and resurrect the torrent from the chunk store.
That example can be found here with the repository being located here if you wish to pick at my code to see how it works.

Hope this helps,
All the best.

@Weedshaker

This comment has been minimized.

Copy link
Author

@Weedshaker Weedshaker commented Mar 1, 2018

Hi @SilentBot1,

I am speechless... This is just fantastic! A great example down to the detail!
I hope that they link to it on the webtorrent.io/docs, this will help people to keep their torrents more alive, which will serve this project greatly, since fluctuating torrents are a bit of a disadvantage of webtorrents...

For myself, to summarize, client.add can resurrect a torrent from idb-chunk-store, when fed the torrentFile and/or parsed torrent. (my mistake was to expect this behavior with the infoHash and/or magnet uri)

Cheers & Thanks a lot!

@lock lock bot locked as resolved and limited conversation to collaborators May 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.