Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutable torrents (BEP46) #886

Open
lmatteis opened this issue Aug 17, 2016 · 16 comments
Open

Mutable torrents (BEP46) #886

lmatteis opened this issue Aug 17, 2016 · 16 comments

Comments

@lmatteis
Copy link

@lmatteis lmatteis commented Aug 17, 2016

Hi guys, so I was thinking of implementing BEP46 in WebTorrent. I already implemented a simple reference implementation on top of webtorrent-cli (https://github.com/lmatteis/dmt), but I think this needs to go deeper in the modules themselves. So after couple hours of studying the code, I think the best place to put this should be in the torrent-discovery. It already has an interval (using recursive timeouts) which announces to the DHT (the regular announcing), so it makes sense to also have an interval that gets()/puts() if it's a mutable torrent.

Higher up in the hierarchy the .add() method of WebTorrent creates a new Torrent() EventEmitter, which emits _infoHash. This could instead emit the same event every time a new torrent is found (hence should simply change from .once to .on). Problem is whether we want to add a new torrent to the list, or should we update the current one when an update is found?

Then inside torrent.js (always WebTorrent) the new Discovery() EventEmitter should probably emit an event for when a new torrent is found in the DHT. Something like .on('dhtTorrent') or something else.

These seem like most of the parts I'd have to touch. Thoughts @feross @substack @mafintosh ?

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Aug 18, 2016

Nice! The great thing about implementing this here instead of in your fork of webtorrent-cli is that all consumers of this library, like WebTorrent Desktop, will inherit BEP46 support for free.

Problem is whether we want to add a new torrent to the list, or should we update the current one when an update is found?

When a mutable torrent gets updated, the info hash of the torrent will change. That will violate a lot of assumptions about how torrent.infoHash works. It never changes right now and apps like WebTorrent Desktop will actually store the value and look up torrents later using client.get(infoHash). Emitting an event and letting the user decide whether to update to the newer info hash could work.

Updating all the places where the info hash is used in a live torrent will be hard. It's used in lots of places and this sounds hard to get right.

Maybe the simplest way to implement this is in stages:

  • Stage 1: Resolve mutable torrent magnet links. Treat magnet:?xs=urn:btpk:[ Public Key (Hex) ]&s=[ Salt (Hex) ] magnet links as "remote torrents" that need to be resolved before their info hash is known (like http links or filesystem paths to torrent files work today). Once the info hash is resolved via the DHT, then it's basically a normal torrent without updating.
  • Stage 2: Make the torrents updatable. Periodically poll for changes to the info has and emit an event torrent.on('mutableUpdate') to let the user know that there's a new info hash. They can switch to the new info hash explicitly by calling torrent.destroy() and client.add(torrent.magnetURI) to add the new updated torrent.
  • Stage 3: Find a higher-level API that's easier to use.

And some questions:

  • Question 1: Should data that's already on disk be re-used if the same file names are present in the new torrent? This might be impossible because if data is added to the middle of the torrent then all the piece hashes afterwards will be completely different.
  • Question 2: Should the old torrent immediately stop being shared?
@feross feross added the enhancement label Aug 18, 2016
@lmatteis

This comment has been minimized.

Copy link
Author

@lmatteis lmatteis commented Aug 18, 2016

Sounds good. Also to consider BEP39 (centralized version of BEP46) where updates of torrents happen through an HTTP server, so these APIs should be the same. So it makes sense to have torrent.on('update') (for both BEP 39 and 46) and then call destroy() and add() accordingly.

I'm a little bit ignorant about re-using the data. I think there's BEP47 which is about this, but I don't understand it yet.

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Aug 18, 2016

So it makes sense to have torrent.on('update') (for both BEP 39 and 46) and then call destroy() and add() accordingly.

Supporting BEP39 as well sounds good to me.

I'm a little bit ignorant about re-using the data.

If you specify the same path as the old torrent, and the new torrent has changed the filenames, then the old files will not be touched. This could confuse the user. For now, let's just leave this up to the user. They can delete the files if they want when they call torrent.destroy().

@lmatteis

This comment has been minimized.

Copy link
Author

@lmatteis lmatteis commented Aug 19, 2016

I was thinking that most of the time publishers will apply updates to the underlying data their sharing. So re-downloading the whole data would be bad. For instance, imagine Archive.org publishing data dumps. As a user I wouldn't want to delete and replace the whole dump every time.

Does webtorrent have capabilities to re-use data if I only appended data to my torrent? Archive.org could then share append-only data structures (CSV?) and the old pieces can be reused, right?

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Aug 19, 2016

Does webtorrent have capabilities to re-use data if I only appended data to my torrent?

I'm not sure how the fs-chunk-store package (which handles mapping the store of pieces to files on disk) will handle the situation where you point it to an existing file on disk, but the size has changed (gotten larger or smaller).

We should ensure that it handles this situation gracefully. If the file name is the same, but it's just larger now, then the file size should be increased on disk without deleting the file. Then, the normal piece verification process can take care of figuring out whether the data is valid or not.

But as I mentioned before, if the user specifies the same path as the old torrent but the new torrent has changed the filenames, then the old files will not be touched. That will leave files from the old torrent and new one side-by-side.

The only way to make data re-use work nicely for the user is if we handle this in WebTorrent. We need to add say a torrent.applyUpdate() method which will switch to the new info hash, and delete whichever files are no longer present in the new torrent.

If you want to tackle this in the same PR, you can give it a go. But I think it would be easier to just punt on this for now and add it in a future PR.

As an initial PR, I would just fire an event and allow the user to call torrent.destroy() and client.add(torrent.magnetURI) manually. Does this seem reasonable to you?

@lmatteis

This comment has been minimized.

Copy link
Author

@lmatteis lmatteis commented Aug 20, 2016

@feross so the first step is to parse the new magnet URI structure. I've nailed it down to this part of code:

https://github.com/feross/webtorrent/blob/master/lib/torrent.js#L235

    parseTorrent.remote(torrentId, function (err, parsedTorrent) {
      if (self.destroyed) return
      if (err) return self._destroy(err)
      self._onParsedTorrent(parsedTorrent)
    })

So this method should do a dht.get(target). Problem is that I'd need to pass the current dht instance and parseTorrent doesn't have that API. Perhaps parse-torrent could be an EventEmitter so we don't need to differentiate between .remote() and sync requests. It would just be

var parsed = new ParseTorrent(opts)
parsed.once('done', (parsedTorrent) => self._onParsedTorrent(parsedTorrent))

Otherwise, without breaking the API, I could send the dht instance as a second optional parameter. parseTorrent.remote({ torrentId: ..., dht: ... }, function () { ... })

@lmatteis

This comment has been minimized.

Copy link
Author

@lmatteis lmatteis commented Aug 21, 2016

Ok so I've added a pull request with the syntax parseTorrent.remote({ torrentId: ..., dht: ... }, function () { ... }) and also some tests: webtorrent/parse-torrent#38

Travis tests seem to break because I'm using ed25519-supercop for the tests, and it seems incompatible with old node versions?

@yciabaud

This comment has been minimized.

Copy link
Contributor

@yciabaud yciabaud commented Aug 22, 2016

I believe this is more likely a travis env issue.

The default compiler must be to old to handle C++11 like said in: https://travis-ci.org/feross/parse-torrent/builds/153981472#L165

To make it work in this repo we need to add the g++ package to the travis image like @substack did : https://github.com/substack/ed25519-supercop/blob/master/.travis.yml#L10

@lmatteis

This comment has been minimized.

Copy link
Author

@lmatteis lmatteis commented Aug 22, 2016

@yciabaud I'm really bad with travis (in fact never used it before). What line should I edit exactly to make it work? Feel free submitting a PR on my fork.

@lmatteis

This comment has been minimized.

Copy link
Author

@lmatteis lmatteis commented Apr 18, 2017

To get things going again @feross there's also this PR on bittorrent-dht waiting review webtorrent/bittorrent-dht#136

@alanhoff

This comment has been minimized.

Copy link

@alanhoff alanhoff commented Aug 28, 2017

Any updates? :-)

@innovator256

This comment has been minimized.

Copy link

@innovator256 innovator256 commented Oct 26, 2017

Seriously guys, any updates on this ?

@feross feross added accepted and removed accepted labels May 3, 2018
@HostFat

This comment has been minimized.

Copy link

@HostFat HostFat commented May 23, 2018

This will be very useful :)

@Ferk

This comment has been minimized.

Copy link

@Ferk Ferk commented May 23, 2018

I was thinking that most of the time publishers will apply updates to the underlying data their sharing. So re-downloading the whole data would be bad. For instance, imagine Archive.org publishing data dumps. As a user I wouldn't want to delete and replace the whole dump every time.

I've been thinking for a while that the ideal approach might be for it to be possible to continue being connected to the old swarm for the chunks that are known to hold the same unchanged data.

This way the swarm would not divide its efforts. If the publisher decides to make changes that you do not want to keep, or for whatever reason you want to be able to access an old version of the torrent, you would be able to do so without the old swarm being abandoned by all the other peers who update or who are newly incorporated, even though most of the data shared is the same.

This would also allow for someone to make a "fork" of any torrent he wants and add/change some data to improve the content without having to start from scratch with a new swarm and new peers for the totality of the content, which would split the community and discourage anyone willing to improve the content of a torrent.

Considering how important is popularity of the content in a P2P data sharing for its speed and stability, and how easy is for less popular content or variations to die, it would be best to maximize the reusability and have better redundancy by being able to rely on more than one swarm for the same data when possible.

It would be great if there was a BEP that allowed something like that... it's possible that this could result in a behavior similar as to how Git works, where you can make a commit on top of another yet keep that commit pointing to the old one, but unlike Git, there would be no need to store the history of the commits, as soon as a new version of the torrent has a change in a chunk, all the references for that chunk to other swarms could be removed.

BEP46 brings a convenient "autoupdate", but I would not mind that much updating manually and having the control to decide when do I want to update. The real problem with versioning in torrents is not really that people don't update, the root of the problem is that when people don't update they do not contribute to the new swarm and there might be peers/seeders lost after every update when people do not keep up.
What if I do not want to keep up with the decisions of the publisher?
What if I'm only interested in seeding a subset of the content from Archive.org up to a certain date and I'm not interested in updating my torrent all the time?

I'm of course not opposing to BEP46, I think it's a convenient feature when you do want to be kept up to date. But the real leap to me will be when torrents can define, for particular chunks of data, to get their peers from other swarms (and only for the data of those chunks). Only then will the real obstacle for torrent "mutation" be removed. You could mutate a torrent without affecting anyone else and still be a contributor to the swarm.

@RangerMauve

This comment has been minimized.

Copy link

@RangerMauve RangerMauve commented Nov 13, 2018

One use case that really excites me about this is the ability to load static websites from a browser.

Sort of like what IPFS is doing, but using existing technology instead of building a new tech stack.

Since webtorrent is being integrated with Brave, it's a perfect avenue for building fully decentralized web apps without needing to invest in any specific cryptocurrency.

With this in place, people could create and publish p2p web sites and have them update in a fully decentralized system without needing any services for hosting their content. (Other than peers).

This would essentially be a competitor to the Beaker Browser and the Dat protocol ecosystem.

@RangerMauve

This comment has been minimized.

Copy link

@RangerMauve RangerMauve commented Jan 4, 2019

I recently published a library, mutable-webtorrent which wraps over the WebTorrent API and adds support for mutable torrents in magnet links as well as some helper functions for creating and updating mutable torrents.

I want to revive this effort if possible and get my changes merged into the main webtorrent repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.