Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for better interoperability between WebTorrent and his partner archive.org #1471

Open
Clodo76 opened this issue Aug 16, 2018 · 15 comments

Comments

@Clodo76
Copy link

@Clodo76 Clodo76 commented Aug 16, 2018

An example archive.org url:
https://archive.org/details/219425main_iss016e033024_hires_full
the .torrent uploaded in instant.io generate the following magnet:

magnet:?xt=urn:btih:5f091a0901022cc966c09cd5f076c53b848f6e23&dn=219425main_iss016e033024_hires_full&tr=http%3A%2F%2Fbt1.archive.org%3A6969%2Fannounce&tr=http%3A%2F%2Fbt2.archive.org%3A6969%2Fannounce&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.fastcast.nz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&ws=http%3A%2F%2Fia600209.us.archive.org%2F33%2Fitems%2F&ws=http%3A%2F%2Fia800209.us.archive.org%2F33%2Fitems%2F&ws=https%3A%2F%2Farchive.org%2Fdownload%2F

  1. issue:
    Mixed Content: The page at '<URL>' was loaded over HTTPS, but requested an insecure resource '<URL>'. This request has been blocked; the content must be served over HTTPS.
    For example the webseed link
    http://ia800209.us.archive.org/33/items/219425main_iss016e033024_hires_full/219425main_iss016e033024_hires_full.jpg
    and yes, https:// version work.

Two suggestion about this:

  • WebTorrent must convert any http:// WS in https:// if running from an https:// website.
    Maybe the webseed don't support SSL, but without it will not works in any case.
  • archive.org must write webseed url in https:// format in their .torrent.
  1. issue:
    This is the above magnet: link adapted for https:
    magnet:?xt=urn:btih:5f091a0901022cc966c09cd5f076c53b848f6e23&dn=219425main_iss016e033024_hires_full&tr=http%3A%2F%2Fbt1.archive.org%3A6969%2Fannounce&tr=http%3A%2F%2Fbt2.archive.org%3A6969%2Fannounce&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.fastcast.nz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&ws=https%3A%2F%2Fia600209.us.archive.org%2F33%2Fitems%2F&ws=https%3A%2F%2Fia800209.us.archive.org%2F33%2Fitems%2F&ws=https%3A%2F%2Farchive.org%2Fdownload%2F

it have anyway CORS/CORB issues:

  • OPTIONS https://ia600209.us.archive.org/33/items/219425main_iss016e033024_hires_full/219425main_iss016e033024_hires_full.jpg 405 (Not Allowed)
  • Failed to load https://ia600209.us.archive.org/33/items/219425main_iss016e033024_hires_full/219425main_iss016e033024_hires_full.jpg: Response to preflight request doesn't pass access control check: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://instant.io' is therefore not allowed access. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
  • Cross-Origin Read Blocking (CORB) blocked cross-origin response <URL> with MIME type text/html. See <URL> for more details.

It can be fixed only on archive.org side.

  1. suggestion:
  • archive.org can publish directly magnet: links.

Maybe someone here can contact archive.org to address some of this issues.

@jimmywarting

This comment has been minimized.

Copy link
Contributor

@jimmywarting jimmywarting commented Aug 17, 2018

A first simple solution is just to add

<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">

It would be grate if the trackers and url's where secure even before the torrent/magnet link even where created. When creating a new torrent, we could log something to suggest to them that they should use https whenever possible if something is insecure.

@stale

This comment has been minimized.

Copy link

@stale stale bot commented Nov 21, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Nov 21, 2018
@stale stale bot closed this Nov 28, 2018
@DiegoRBaquero DiegoRBaquero reopened this Nov 28, 2018
@stale stale bot removed the stale label Nov 28, 2018
@stale

This comment has been minimized.

Copy link

@stale stale bot commented Feb 26, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Feb 26, 2019
@stale stale bot closed this Mar 5, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 3, 2019
@feross feross added enhancement and removed stale labels Sep 5, 2019
@webtorrent webtorrent unlocked this conversation Sep 5, 2019
@feross feross reopened this Sep 5, 2019
@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Sep 5, 2019

@mitra42 do you have any thoughts about this issue?

@mitra42

This comment has been minimized.

Copy link

@mitra42 mitra42 commented Sep 5, 2019

Plenty of thoughts, but have been unable to get the changes done at the Archive :-(

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Sep 5, 2019

@mitra42 What is the main objection to supporting this at the Archive?

@mitra42

This comment has been minimized.

Copy link

@mitra42 mitra42 commented Sep 5, 2019

No objection, but one of those complex things that requires
a) certainty about what the magnet link should be - see below and then
b) a bunch of people to agree on how to implement the solution (involves front end https gateway, and cors gateway and rewrite process design) and then
b) a massive task to rewrite 50 million torrent files to have the correct data in them - which requires prioritisation.

I tried a couple of times, I finally gave up and rewrite my own magnet links and hack torrent files, on hte fly as in dweb (e.g. in https://dweb.archive.org/metadata/commute )

magnetlink: magnet:?xt=urn:btih:P2CJUDZMBZPZ5JVLOKBH45UQIBUQJMAQ&tr=http%3A%2F%2Fbt1.archive.org%3A6969%2Fannounce&tr=http%3A%2F%2Fbt2.archive.org%3A6969%2Fannounce&tr=wss%3A%2F%2Fdweb.archive.org%3A6969&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=wss%3A%2F%2Ftracker.fastcast.nz&ws=https%3A%2F%2Fdweb.me%2Farc%2Farchive.org%2Fdownload%2F&xs=https%3A%2F%2Fdweb.me%2Farc%2Farchive.org%2Ftorrent%2Fcommute

Three issues came up:
p) how to drop the CORS without creating other security issues, I don't have a solution to that.
q) lack of management interface on the tracker John or you helped me get going on WSS, I think this one is lower priority
r) certainty about what the torrents/magnet links should be. It is crucial if there is to be any chance to move forward that we need to know what should be in the magnet link/torrent files because they can't easily be rewritten a second time.

Open questions:

  1. Does the tracker I added at wss://dweb.archive.org:6969 actually work - I dont have any easy way to test this.
  2. Does https://bt1.archive.org:6969/announce work (i.e. is it answering on https already)
  3. Which of the other trackers in that list are permanently down - I see a lot of errors from them on the rare occasions I try and use WebTorrent.

I'll be back in San Francisco Oct 19th for the big Archive event on 23rd and staying through Nov 10, that would be best time to try and get a solution (to Cors and torrent rewrite) if we can get answers to these 3 questions about the required torrent file before then. Will you be in SF then?

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Sep 5, 2019

how to drop the CORS without creating other security issues,

The solution seems like it should be: always allow cross-origin requests for static files, deny cross-origin requests for everything else. The logic goes like this: By definition, the Internet Archive wants to make the static files available as wide and far as possible.

Letting JS code from any website download these files directly introduces no vulnerabilities as the site owner could always set up a proxy server to fetch the files from the Internet Archive and proxy them. All other requests like logging in, posting comments, etc. should continue to be denied to cross-origin JS code.

r) certainty about what the torrents/magnet links should be. It is crucial if there is to be any chance to move forward that we need to know what should be in the magnet link/torrent files because they can't easily be rewritten a second time.

It seems like the requirements are:

  • Use https instead of http in the web seed URL.
  • Add the new WSS tracker server that you set up.

Nothing else should be needed.

One other thing I'll add is that you might want to just rewrite the .torrent files on the server-side whenever a .torrent file is requested by the user and serve the rewritten torrent to the user. Then, there's no need to update the actual .torrent file stored in the Internet Archive backend. Seems a bit gross, but it has the advantage that you can rewrite the list of trackers at any point in time (in case your WSS server URL changes) and you don't need to reprocess all the torrent files again.

@mitra42

This comment has been minimized.

Copy link

@mitra42 mitra42 commented Sep 6, 2019

Re CORS - I agree, I've made that same argument unsuccessfully.
Re WSS - I have no way to know if its working, and am hesitant to push for a fix until (maybe you) can confirm that it works - I'm not at all sure how to test it.
Re HTTPS - agreed, but the again need to know whether or not those trackers are already responding correctly on HTTPS, and have no way to check.
Rewriting torrent's on demand is what I'm doing in dweb.archive.org, I've failed to get agreement on rewriting anything on the main (archive.org) site. I think the issue may be to do with the already large load on that front-end.

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Sep 6, 2019

Re HTTPS - agreed, but the again need to know whether or not those trackers are already responding correctly on HTTPS

Regarding HTTPs, I was referring to the web seed URLS (i.e. the URLs that host the content itself). I believe that these already work over HTTPS.

Regarding the other points, that is disappointing.

I'm closing this issue since there's no bug to fix here, but feel free to continue discussion!

@feross feross closed this Sep 6, 2019
@mitra42

This comment has been minimized.

Copy link

@mitra42 mitra42 commented Sep 6, 2019

The seed URLs work over HTTPS but have CORS issues,

The Tracker URLs as I said I have no way to test,
Absent any assistance from you to test them, I think this is unfixable.

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Sep 6, 2019

@mitra42 I'm super busy at the moment, but I can try. Perhaps other from the community may be able to help too. Can you open an issue with the tracker URL and a description of the expected behavior? This is the tracker server which creates a peer for content on-demand and always includes that peer's IP address in the response, right?

@mitra42

This comment has been minimized.

Copy link

@mitra42 mitra42 commented Sep 6, 2019

One of these (wss://dweb.archive.org:6969 or probably https://dweb.archive.org:6969 as well) is that tracker. I'm not trying to test whether its doing the part of including the super-peer, just trying to figure out how to know if that tracker (and the others in the IA torrents) works before I ask people to rewrite torrent files. I simply don't know how to test if a tracker works.

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Sep 11, 2019

I don't have time at the moment, but I'll reopen this issue in case someone from the community is interested in testing out these trackers to help the Internet Archive.

@feross feross reopened this Sep 11, 2019
@feross feross added the help wanted label Sep 11, 2019
@mitra42

This comment has been minimized.

Copy link

@mitra42 mitra42 commented Dec 24, 2019

A quick update ... I've added two microservices, so that people wanting working torrents or magnet links don't have to rely on the rest of the dweb.archive.org code ...
For example ....
https://www-dweb-torrent.dev.archive.org/commute will get a working torrent for the "commute" item,
https://www-dweb-metadata.dev.archive.org/metadata/commute is like our standard metadata API but fixes the magnet link (based on same logic as fixing the torrent).
please let me know if you find issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.