Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Hex encoding for infohashes and offers IDs in tracker protocol. #1676

Open
wI2L opened this issue Jul 24, 2019 · 4 comments
Open

Use Hex encoding for infohashes and offers IDs in tracker protocol. #1676

wI2L opened this issue Jul 24, 2019 · 4 comments
Labels

Comments

@wI2L
Copy link

@wI2L wI2L commented Jul 24, 2019

Hello

I have been working on writing a WebTorrent tracker in Go, and I would like to propose a breaking change regarding the encoding of the info_hash and offer_id.
Currently these fields are raw binary strings, which makes it very difficult and inconvenient to deal with when using UTF-8 WebSocket text messages.

This past PR #203 introduces some changes in the WebTorrent codebase to use an Hexadecimal encoding in the code, and I would like to extend this to the WebSocket messages exchanged between a client and a tracker.

I am well aware that this is a breaking change for current clients, but since the numbers of clients available out there is still small, it still look feasible.

I originally proposes this in the draft BEP I wrote in cunjunction with @yciabaud here: #881.

The current WebTorrent tracker protocol is far from perfect, but this is the only thing that is gamebreaker in my opinion.

/cc @feross @yciabaud

@wI2L

This comment has been minimized.

Copy link
Author

@wI2L wI2L commented Jul 24, 2019

Also, I would like to use this PR to relaunch the subject of the WebTorrent BEP that was living here: #881. Sadly the conversation is now locked. The issue mentionned in this PR is one of the many that could be improved about the WebTorrent tracker protocol. I still want to work on this, and actively help getting it adopted (in the form of an alternate tracker written in Go), but it will require the involvement of many people to update the current WebTrrent codebase if a new BEP is written and adopted.

See #881 (comment) for reference.

@wI2L wI2L changed the title Use hex-encoded infohashes and IDs in tracker protocol Use Hex encoding for infohashes and offers IDs in tracker protocol. Jul 24, 2019
@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Aug 3, 2019

I've re-opened and unlocked #184

I would like to propose a breaking change regarding the encoding of the info_hash and offer_id. Currently these fields are raw binary strings, which makes it very difficult and inconvenient to deal with when using UTF-8 WebSocket text messages.

I don't see the advantage of making a breaking change here. Can you elaborate on the advantages? WebTorrent uses WebSockets in binary mode and it's no problem to send binary data as a string. Furthermore, the HTTP tracker protocol uses binary strings so it's nice to just re-use that logic on the torrent client side.

Is it impossible to make this work in Go? Or is it just inconvenient?

@feross feross added the meta label Aug 3, 2019
@wI2L

This comment has been minimized.

Copy link
Author

@wI2L wI2L commented Aug 3, 2019

Thanks for your response @feross

I am glad that #184 is re-opened, I hope that progress towards a clear WebTorrent BEP will restart. I am definitively engaged to help about this.

Regarding the issue itself.
First, I always assumed that WebTorrent was using WebSocket text frames, I don't know how I missed it after readind so much the current tracker's implementation. That would explain why I experienced weird encoding issues in the past, where the binary string of the info_hash and offer_id fields had bad UTF-8 encoding.
Could you give me some insights about this choice ? I am guessing it is directly related to the core of the issue I am raising regarding binary fields.

Regardless, I'd like to adress your question about the advantage of using nnon-binary strings.
First, the JSON format natively doesn't support binary data, therefore, I don't quite understand the initial motivation that lead to this situation. Could you please give me some feedback?

Regarding Go, the states documentation for the encoding/json package states that:

String values encode as JSON strings coerced to valid UTF-8, replacing invalid bytes with the Unicode replacement rune.

source: https://golang.org/src/encoding/json/encode.go?s=6430:6473#L147

Because of that, encode/decode operations fails because of the binary data inside the fields I mentionned before. I experimented with payloads sniffed from webtorrent.io.
As of now, I am using another library to unmarshal the payloads. It allows to directly access a subslice of the original payload without having to unmarshal everything. For marshaling, I am writing my own lib based on a pre-compiled instructions set, so its definitively doable, but not convenient.

Last, but not least, since JSON is the format used for payloads exchanges between a client and a tracker, it would make sense, IMHO, that all the fields are human-friendly. To the best of my knowledge, binary infohashes are mostly used in torrent metainfo files, while the hex/base32 representations are used in all other cases (UI, printing ...)

You mentionned that the HTTP protocol uses the binary format, which is correct. However, the current tracker implementation already converts those fields from binary to hex, and vice-versa at differents places in the codebase. Why would it be a problem to do the same thing for the Webtorrent protocol? In my opinion, we shouldn't try to stick with what already exists for HTTP and UDP client/tracker protocols.

@wI2L

This comment has been minimized.

Copy link
Author

@wI2L wI2L commented Aug 3, 2019

To be clear, I am raising this issue not a request to change the current implementations (clients/trackers), but rather to (re)start the discussions about the BEP proposal, amongst other things. We should probably find a better place than a GitHub issue to talk about this. What do you think ? Is it something (the BEP), that you are still inclined to work on ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.