Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebRTC DHT #288

Open
piedshag opened this issue Mar 20, 2015 · 94 comments
Open

WebRTC DHT #288

piedshag opened this issue Mar 20, 2015 · 94 comments

Comments

@piedshag
Copy link

@piedshag piedshag commented Mar 20, 2015

Discussion for any developments with the webrtc dht

@feross
Copy link
Member

@feross feross commented Apr 4, 2015

This is something to look at for anyone looking to implement this: https://software.intel.com/en-us/blogs/2015/03/18/meshcentral-experimental-webrtc-mesh

cc @jhiesey

@feross feross changed the title DHT WebRTC DHT Apr 4, 2015
@lmatteis
Copy link

@lmatteis lmatteis commented Apr 23, 2015

@smarts
Copy link

@smarts smarts commented Apr 27, 2015

I was looking for a [Google] Chrome Browser extension for handling BitTorrent metainfo files and magnet URNs and I stumbled across this project, which seems really cool! If I understand correctly DHT is required for the handling of magnet links that don't include tracker URLs, correct? If so does that mean webtorrent doesn't work in the browser for magnet URNs w/o webrtc-dht?

@feross
Copy link
Member

@feross feross commented Apr 27, 2015

@smarts When we detect a magnet link without a tracker on https://instant.io, we just add the tracker wss://tracker.webtorrent.io so that there's a place to get peers from. You can do the same in your code. :-)

@smarts
Copy link

@smarts smarts commented May 12, 2015

Thanks @feross!

feross referenced this issue in webtorrent/instant.io May 28, 2015
@fsteff
Copy link

@fsteff fsteff commented Jun 12, 2015

Any progress on this? I really need this feature (in combination with mutable keys, but one after the other...)
I´ll take a look at what @fbodz has already done and try to support him.

@feross
Copy link
Member

@feross feross commented Jun 12, 2015

No progress. If someone wants to work on this, the WebTorrent project would greatly benefit!

@iaaaan
Copy link

@iaaaan iaaaan commented Jun 12, 2015

Same here, I'm very excited about this. I added $35 to the bounty... it's very few but I hope it might help in some way.

@moshest
Copy link

@moshest moshest commented Jun 25, 2015

I'm doing research about it a couple of hours and I think that DHT algorithm isn't fit right for WebRTC:

  1. Each connection channel is only one way so each DHT request will cost us two channels.
  2. Each channel should remain open for future requests, otherwise we will need to build the n-buckets database all over again.
  3. Each connection require offer-answer handshake via singling server or third peer.

It's seems that WebRTC is very expensive for DHT requests.
Any suggestions for maybe other algorithm?

@jhiesey
Copy link
Contributor

@jhiesey jhiesey commented Jun 26, 2015

I actually spent some time thinking about this problem a few weeks ago and I think a WebRTC DHT may be feasible. I'll add my thoughts here soon.

@daviddias
Copy link

@daviddias daviddias commented Jun 29, 2015

@feross what would be necessary requisites to have a DHT interesting enough to be used for the WebTorrent project. I'm asking this with the following thoughts in mind:

  • Implementing Kademlia DHT for browsers would be quite expensive, even unfeasible if we keep open connections to the older peers (browsers would go quickly out of resources)
  • BitTorrent performs DHT searches using a iterative method (we ask peer A, and then peer B), this would mean that adding to the WebRTC DataChannels opened with our KBucket, we would be also be signalling on the fly more DataChannels to the peers we learn along the way to find the destination. Another way to perform the search is "Store and forward", which is how webrtc-explorer works (also IP, or DNS in recursive mode).
  • Do we have to use Kademlia at all? After all if we keep the same ID namespace and deterministic properties of a Structured DHT, we could have on the browser side a different algorithm that is more considered of browser capabilities/limitations.
  • Is it expected for a bittorrent client to be able to stream a file being served by a browser? Well, only with the hybrid client in the middle to do the "providing" step, correct?

Happy to work of these things for the WebTorrent project :) Please free to push any notes, ideas or any other info you have thought about this.

@moshest
Copy link

@moshest moshest commented Jun 29, 2015

After some more thinking, I think that the signaling server is the best place for doing DHT queries. Webtorrent should have a few open signaling servers exactly for that case.

Please take a look at PeerJS. We can change the server implementation and add support for DHT queries between those servers.

@feross
Copy link
Member

@feross feross commented Jul 1, 2015

@diasdavid

I was thinking we could adapt the Kademila algorithm. So, normally the bucket size is K=8 or K=16. But we only need that level of redundancy in each bucket because there are no guarantees when you add a DHT node to the routing table that they'll be online when you need to contact them later. With WebRTC, we have to keep the connection active. This means we can reduce the number of nodes in each bucket to something lower, like K=2.

In traditional Kademlia (K=8), for an network of 1M nodes (evenly distributed across the id space), each node will have 144 entries in routing table (check my math).

For K=2 and a network of 1M nodes, we only need 40 entries in the routing table, a much more reasonable number of data channels. :)

Another difference caused by WebRTC is that when other people have you in their routing table, you have to keep an open connection to them so they can contact you. These are "wasted" connections for you, since they aren't necessary for your own routing table. Nonetheless, they need to be kept open for the other node's sake.

If a node is equally likely to be in every other node's routing table, then that would add 40 additional connections. But, over time Kademlia actually prefers long-running nodes since the client never evicts a responsive node from a bucket. This means that over time long-running nodes will end up in lots of client's routing tables. With the WebRTC model, that means that the long-running nodes will end up needing to keep tons of connections open to support other people's routing tables.

To solve this, I think we could change the part of Kademlia where we keep longer-running nodes in the table, so every node becomes equally likely to be added to the routing table.

Of course, we could ditch Kademlia entirely and come up with something better from scratch, but then it will be harder for existing torrent clients to add support. So something that's basically Kademlia with a few changes would be easier for them to support.

For WebTorrent, it doesn't matter. We can add anything that works. If there was a reliable general purpose WebRTC DHT not based on Kademlia, WebTorrent would start using it immediately. Even if desktop torrent clients don't support it, it's still useful for web peers to find each other. And later, if a WebRTC DHT that is closer to Kademlia came along, we could always switch to it later :)

"Store and forward" might be a good change for WebRTC since it eliminates the STUN/ICE connection overhead that would happen for each connection.

It'd be awesome to hear your thoughts on these ideas, @diasdavid!

@feross
Copy link
Member

@feross feross commented Jul 1, 2015

@moshest Putting that logic into a central server would mean that it's no longer distributed, the D in DHT. At that point, it's basically a tracker server, and we already have that part working :) What we really want is a DHT so we can eliminate the central point of failure that is a tracker server.

@daviddias
Copy link

@daviddias daviddias commented Jul 2, 2015

Thanks @feross , lots of insights! That was exactly what I was looking for, to kick things off :)

I think a K bucket of 2 is a good bet on a WebRTC DHT, or I would even say that a K bucket of 1 should suffice. If I'm not missing anything, K buckets offer mainly redundancy for greater availability, having more than one peer at a specific distance means that we always have more than one shot to contact someone on that branch/bucket, which is great for non browser DHTs, since peer representations are pairs IP:Port. In a browser DHT, since we have to keep Data Channels open to avoid all of the STUN/ICE dance each time, we know which is the precise moment a peer goes away, giving us the chance to react and quickly do the handshake with a new peer.

Since we need to limit the number of "xor-metric distances" (going with "xmd" for short) used, so that we avoid having too many DataChannels open, we can consider some strategies for finger distribution, for e.g:

  • Inspired on CHORD/PASTRY, giving privilege to closer distances and ensuring that at least we have some fingers to make long jumps.
  • Uniform gap, simpler, for e.g with a cap of 20 xmd and for a DHT with 160 bit, we would have entries for 8 bit xmd, 16 bit xmd, 24 bit xmd, 32, 40, 48 ... 160 and one kbucket for each..

The first case is interesting for things like PAST (or DynamoDB), because peers store data ( and not pointers), so enables data replication to closer peers. But since this is WebTorrent and we only need to do is know who is 'providing' what, the second might make more sense. Dunno, ideas :D

One thing to have in mind though is that by having a xmd cap of 20 and K=2, we only set the minimum data channels needed, there still might be some cases where a peer is far enough to not be picked by another peer k bucket , but since it has to know some peer on that branch, it will open a connection to the peer that didn't picked him, creating kind of a one direction channel.

For WebTorrent, it doesn't matter. We can add anything that works. If there was a reliable general purpose WebRTC DHT not based on Kademlia, WebTorrent would start using it immediately

Independently of which the DHT is based on, there will always have to be something to do the Signalling. but for WebTorrent case, you already have that. Would it be interesting that instead of starting two signalling servers, we could have a way to mux between signalling for peers that will do file exchange and for peers that will do handshake for DHT? Like a protocol muxer on top of a WebSockets connection (or even WebRTC DataChannel directly with the server)? Please do tell me how could I expose better the webrtc-explorer DHT in a way that we could make that experiment :)

@maxogden
Copy link

@maxogden maxogden commented Oct 12, 2015

A couple random thoughts:

  1. QUIC/UDP DataChannels (https://www.youtube.com/watch?v=mIvyOFu1c1Q) will make a WebRTC DHT a lot more efficient, as it has a 0RTT protocol for p2p connections, vs the 4RTT one used by SCTP/DTLS/UDP Data Channels today
  2. Perhaps we can store-and-forward peers external ip:port from the initial STUN and have some mechanism for attempting to have these peers plus new peers hole punch each other without having to coordinate through a STUN server.

The idea here is that once you've holepunched once (see 3.4 here for background), and you have a cone NAT, your public ip:port should stay the same for all future connections.

Let's say you have peers A, B, C. A and B successfully holepunch to each other through a STUN server. A and B now know each others external ip:port. Lets say C comes along and connects to B, also through STUN. B and C now know each others external ip:port.

A <--> B <--> C

To get A and C connected, we could do STUN again, but thats expensive. We can use B to swap SDPs between A and C and have them initiate a holepunching dance without STUN. This would bypass the ICE candidate collection phase (since we are effectively caching it here) and also avoid extra roundtrips to the STUN server.

@maxogden
Copy link

@maxogden maxogden commented Oct 13, 2015

According to someone on twitter, if you explicitly set ICE/STUN servers to empty arrays you can skip most external ICE checks (no way to skip internal ones though).

@maxogden
Copy link

@maxogden maxogden commented Oct 13, 2015

Update: you can't skip internal ICE, because you need to find out the internal port that was created for your new SCTP session, so you can relay this port to the other peer. Since peerconnections will always have different ports (e.g. they wont do port multiplexing since its an entirely encrypted protocol), my scheme proposed above where you reuse the SDP from an earlier peer won't work because you won't know their new port without directly asking them, which ends up being the same number of round trips as the normal ICE candidate + signaling flow :(

@allouis
Copy link

@allouis allouis commented Oct 19, 2015

@maxogden Although that flow is the same number of round trips, it seems like a better option because we end up with "signaling peers" rather than servers, keeping the overlay network decentralised.

@feross Do you think having a WebRTC backed dgram module would be a good start to the bittorrent-dht being usable in the browser, or we should be starting this from scratch?

I've been working on some small prototypes for this and would be interested to hear thoughts.

@substack
Copy link
Contributor

@substack substack commented Oct 21, 2015

I think a dgram overlay network is the most exciting way to tackle this problem because it means we can use bittorrent-dht as-is and we can also implement all sorts of other distributed protocols that were implemented under an assumption of udp/tcp primitives.

Unfortunately, an overlay network with forwarding means we'll need to implement routing algorithms. From what I gather, mesh routing algorithms don't scale very well (perhaps up to hundreds of nodes or maybe low thousands of nodes), so internet-scale routing systems are split up into interconnected autonomous systems (AS) with bridge nodes that advertise the autonomous system prefixes on their local network. I haven't found any research papers so far that bring together all of these ideas into a comprehensive technique, so we might need to experiment with how to glue these ideas together. Here's a good overview of how these routing systems interconnect: http://www.cc.gatech.edu/~traynor/cs3251/f13/slides/lecture13-routing.pdf

As for routing protocols, babel seems like a good candidate. I've already started to implement babel with some simulations to check the results.

Each AS could maintain a minimum number of connections to other AS networks so that it's very unlikely to get a disconnected graph.

Which AS you join on start can be a product of the node ID and then an AS could split when it gets too big.

Here's what I have so far for the babel routing protocol implementation: https://github.com/substack/babel-routing-protocol

@substack
Copy link
Contributor

@substack substack commented Oct 22, 2015

It looks like IPFS might have a similar approach separating peer routing and content routing to cope with NAT traversal: ipfs/specs#1

@allouis
Copy link

@allouis allouis commented Oct 22, 2015

we can also implement all sorts of other distributed protocols that were implemented under an assumption of udp/tcp primitives.

^ This would be a great place to get to.

mesh routing algorithms don't scale very well (perhaps up to hundreds of nodes or maybe low thousands of nodes)

This is an issue, but maybe not one we need to address immediately, if I understand Kademlia correctly the number of open connections we'd need to support the dht would be 160 (or whatever number of bits we use for nodeIds), the browsers maximum peer connections is 256 so this seems pretty like it could be an option

What concerns me about a dgram overlay network is how we bootstrap it? Ideally it would function as the other browserify alternatives to core node modules, a drop in replacement with the same api, would we have browser only methods for doing this?

@RangerMauve
Copy link

@RangerMauve RangerMauve commented Nov 30, 2018

I was actually thinking about this a while ago.

I think the problem I was seeing was that you can't send a push notification to another user from within a browser due to CORS restrictions. Unless the push notification service has the proper headers, the browser won't allow sending POSt requests to it.

@jimmywarting
Copy link
Contributor

@jimmywarting jimmywarting commented Nov 30, 2018

due to CORS restrictions

Yea, notice that too. I wonder why they didn't enable CORS. forced to use a backend/proxy server
if i want to build a peer to peer chat between two people that have friended eachother and exchanged token then i think they should be able to ping eachother without having to go throught a backend

Edit: mozilla's endpoint responds with CORS headers

@codebudo
Copy link

@codebudo codebudo commented Nov 30, 2018

@jimmywarting
Copy link
Contributor

@jimmywarting jimmywarting commented Nov 30, 2018

A STUN/TURN server don't have to do any CORS preflight stuff as a client you never send any ajax request to those servers - it's pretty much handled for you in the background

@mikeal
Copy link
Contributor

@mikeal mikeal commented Nov 30, 2018

I should have updated this a long time ago.

I had an informal conversation about 4 months ago with some standards/browser folks that work in this area. From their point of view, a bunch of p2p people have been asking for raw socket access and the ability to open ports in order to resolve this, which is just never going to happen in the browser.

Once we talked through it, I landed on language that was really helpful: What we need is a re-usable signal for WebRTC connections.

For a variety of security reasons the browser can't scrap the signal exchange flow but we could potentially create a signal that is reusable and could be added to a DHT and used by other peers for a longer period of time.

@jimmywarting
Copy link
Contributor

@jimmywarting jimmywarting commented Nov 30, 2018

re-usable signal for WebRTC connections.

Are you talking about ORTC? have heard about ORTC but never investigated it.
Is ORTC something diffrent then WebRTC?
How can we have re-usable sdp?
which browser support it?
can the same sdp be used by others?

@RangerMauve
Copy link

@RangerMauve RangerMauve commented Nov 30, 2018

Edit: Disregard that, I wasn't being constructive.

@mikeal
Copy link
Contributor

@mikeal mikeal commented Nov 30, 2018

I haven't spent enough time with ORTC to know if they solved this or not.

In general, it's best to be a little less specific with browser vendors on feature asks like this. Describing the ask in existing WebRTC terms that don't change the security model except in one specific way which you actually need is the best way to work through a variety of possible solutions.

The performance of data channels makes it very costly to have many connections and doesn't scale well

This isn't a given. I have my own thoughts on why the current performance is terrible but performance is a solvable problem. Nothing about WebRTC makes it inherently less performant and my current view is that the implementations are just old and nobody is really touching them, the browser vendors are just binding to a bunch of RTP libraries.

People complain about the performance a lot but, according to Mozilla, nobody has ever logged a usable issue on the subject. What aspect of performance is bad? Where's a reusable test case?

For the most part, people who write browsers are not also writing web applications, so you can't assume they are aware of things we are aware of unless someone has done the work of properly communicating it to them. Yes, they don't make it easy, they all have obtuse bug tracking processes that are used by literally no other project, but that's the situation we're in.

There's no way to accept incoming connections without signalling servers

If you had a re-usable signal that lasted 24 hours you'd be fine.

@mappum
Copy link
Member

@mappum mappum commented Nov 30, 2018

What we need is a re-usable signal for WebRTC connections.

Seems like a worthy goal, but to implement re-usable signals wouldn't you need to build a DHT or central routing service anyway? We'd have to assume users are roaming around to different networks and would need to locate them as their address changes.

@nazar-pc
Copy link

@nazar-pc nazar-pc commented Nov 30, 2018

WebRTC-based DHT is definitely possible and while performance is a concern, potential number of users in this case I think is several magnitudes higher.

I've actually studied DHTs and various approaches to their construction for a while and created another WebRTC-based DHT called Detox DHT (and ES-DHT that is used as generic independent framework under the hood) that I use instead of previously mentioned version.

It is built from scratch using alternative design that takes some major ideas from Kademlia and other papers, but doesn't replicate Mainline DHT with its inherent incompatibilities with WebRTC.

It has somewhat different focus, but still might be of interest to others.

Detox DHT repository contains source code and tests. By following link to ES-DHT it is based on, you can find framework's source code, tests, design document, specification and references to papers.

@mikeal
Copy link
Contributor

@mikeal mikeal commented Nov 30, 2018

Seems like a worthy goal, but to implement re-usable signals wouldn't you need to build a DHT or central routing service anyway? We'd have to assume users are roaming around to different networks and would need to locate them as their address changes.

You have this problem no matter what. DHT's that store IP addresses suffer from the exact same issue.

@mappum
Copy link
Member

@mappum mappum commented Nov 30, 2018

@mikeal Right, just trying to point out that getting browsers to allow re-usable SDP just pushes the problem of implementing a DHT onto them (which maybe is what we want).

@RangerMauve
Copy link

@RangerMauve RangerMauve commented Nov 30, 2018

Assuming you had reusable signals, how would connections work?

Would it be something like:

Receiver side:

  • Create something similar to a RTCPeerConnection
  • Create SDP offer or something similar
  • Publish somewhere public
  • Listen on incoming connections
  • Create RTCPeerConnection per connection
  • Somehow configure it with your offer?
  • Somehow get their SDP answer?
  • Profit!

Initiator side:

  • Find the published SDP
  • Create an RTCPeerConnection
  • Set the SDP offer
  • Create SDP Answer
  • Somehow get it to the person? This will need centralization in the form of signalling servers
  • Profit!

You could potentially have a DHT that bootstrapped by talking to an initial centralized node, and then using this reusable SDP from then on, but you're still constrained by having to rely on centralized signalling servers.

getting browsers to allow re-usable SDP just pushes the problem of implementing a DHT onto them (which maybe is what we want).

I'm 100% into this. A few months ago I was trying to convince the folks behind the Beaker Browser to provide an interface to the DHT used for Dat in order to enable different types of p2p applications. Since it'd be cheaper to have a single DHT set up per browser rather than per browser tab or origin, having a DHT API provided at the browser level helps performance, and having a standard high level API simplifies p2p application development.

This is currently being replaced by a higher-level PeerSocket API that goes a step further by providing an interface that opens sockets to peers for a given channel and abstracts away the DHT or MDNS systems that are behind the scenes.

@klueq
Copy link

@klueq klueq commented Dec 17, 2018

What could make some progress on this is an elaborate 10-page proposal about what needs to be changed in WebRTC to make this P2P scenario possible. The suggested changes need to be simple and they should also bring some business value. The proposal could make a point about the emerging market of P2P apps or maybe eliminating the ICE step from video calls and thus making them connect faster. Then this proposal need to be presented to the decision makers and if they are convinced, we'll likely see in a year how the WebRTC team rolls out the new API. I would push for an option where a previously established RTCPeerConnection could be saved and then restored by id: WebRTC would send a UDP message to the saved ipv4:port and if the response is correct, we assume that it's safe to restore the connection and skip all the SDP handshakes.

Another option is to let the desktop nodes of webtorrent act as signalling servers.

@Slender1808
Copy link

@Slender1808 Slender1808 commented Jul 27, 2020

any news ?

@perguth
Copy link

@perguth perguth commented Jul 27, 2020

@Slender1808
Copy link

@Slender1808 Slender1808 commented Aug 4, 2020

Does WebRTC use compression in video or audio transmission?
if not, would you be able to convert data as text into images using native image transmission?

@perguth
Copy link

@perguth perguth commented Aug 4, 2020

@jimmywarting
Copy link
Contributor

@jimmywarting jimmywarting commented Mar 17, 2021

There is a proposal to bring raw-sockets into the web https://github.com/WICG/raw-sockets/blob/main/docs/explainer.md

@qgustavor
Copy link
Contributor

@qgustavor qgustavor commented Mar 17, 2021

@jimmywarting It looks nice until you think about the damage potential: according to Mozilla "The safeguards outlined in the explainer are inadequate and incomplete. Relying on user consent is not a sufficient safeguard if this capability were to be exposed to the web.".

I agree, at least by now: the idea of putting it behind a prompt is too naive, I remember countless times seeing users just pressing "OK" to skip prompts without thinking and in this case would not be different. Also, in the context of WebTorrent, requiring users to entry the IP and port of each peer is a bit weird, as other clients don't do that. I think it's as weird as requiring users to allow local font access just to watch a video because the subtitle renderer needs to access raw font data.

If the proposal removes the need to entry the IP address to improve UX then the risk of users just clicking OK to skip the prompt raises. In the other hand, if the proposal requires to fill the IP address attackers can still ask users to paste some IP address in the prompt and many will obey: why should users know what are the risks related to connecting to some IP? I'm sure there are many devices that have low requirements for LAN users as they assume the LAN is safe. I had a router like that and, worse, the password could not be changed. You don't even need to ask the user to copy the IP, just use the old document.execCommand("copy") when the user clicks something.

I hope this proposal improves, not that they halt it: even if it start requiring some security mechanism to allow the connection, it still will be useful for WebTorrent: as long this mechanism is easy to implement along the existing code I think it have a higher potential of being implemented in other clients than using the complex WebRTC. Not only that, if this mechanism works between two browser clients then it would reduce issues with signaling, maybe even removing the need of a signaling server depending on how this mechanism gets implemented.

@jimmywarting
Copy link
Contributor

@jimmywarting jimmywarting commented Mar 18, 2021

maybe even removing the need of a signaling server depending on how this mechanism gets implemented.

I tried very hard to remove the need of a signaling server about a mount ago (since i run a static website), i wanted to connect to a nearby peer or friend with either bluetooth, nfc, and even with qr code but the best solution i came up with was the previous idea i wrote a while ago by using web push
demo: https://jimmy.warting.se/2021/02/16/p2p-signal-with-webpush.html

So now i have a library that can encrypt/send web push payloads all from within the browser

@lgrahl
Copy link

@lgrahl lgrahl commented Mar 18, 2021

You might be interested in this: https://discourse.wicg.io/t/idea-local-devices-api-lan-services/5056/8
I'm certain Michiel De Backker would appreciate feedback. (Unfortunately, I was not able to review it so far.)

@rektide
Copy link

@rektide rektide commented Mar 29, 2021

@qgustavor if mozilla is unable to find a permissions regime that satisfies them, they should help out and develop the specification, and not ship an implementation. what they should not do is refuse to let the web grow better capabilities. mozilla should stop obstructing very valuable work. if they want to be neutral, abstain, that's fine, but what they are doing now is hurting the web.

maybe the user has to go in to the site settings & enable permission manually- maybe it never shows up. the permissions api doesn't mandate how permissions requests are shown or mandate that they pop-up. it's up to a browser to figure out how to be a good user agent. mozilla seems fixated on only a very narrow, conventional understanding of permissions, & seems to be roadblocking powerful capabilities that would greatly greatly help the web.

@Slender1808
Copy link

@Slender1808 Slender1808 commented Mar 29, 2021

would it be interesting to use broadcast address to find pairs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet