Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real HTTP Proxy Mode #35

Closed
tenox7 opened this issue Jun 5, 2019 · 7 comments
Closed

Real HTTP Proxy Mode #35

tenox7 opened this issue Jun 5, 2019 · 7 comments

Comments

@tenox7
Copy link
Owner

tenox7 commented Jun 5, 2019

historical facts:

  • the whole internet quacks https since 2016 thanks to eff and google
  • https was first introduced in 1994 by friends of jwz at netscape
  • http connect was introduced in 1999 by http 1.1
  • a growing number of clickable elements on web pages arent a href links

meaning:

  1. browsers prior to 1994 don't support https, they will probably error when trying to enter https://...
  2. browsers past 1994 will have https but there was no http connect until 1999, so how do they proxy https requests? via http? ftp? gopher? smoke signals?
  3. browsers past 1999 will have http1.1 connect and will demand ssl cert on connection, can you generate a cert that will satisfy ca shipped ages ago? or should you convert everything to http (aka sslstrip)
  4. how do you proxy non href links, eg close a cookie popup window, move a map, fill a form, select a dropdown or play some flash/html5 games?
@tenox7 tenox7 changed the title http proxy mode real http proxy mode Jun 7, 2019
@tenox7 tenox7 changed the title real http proxy mode Real HTTP Proxy Mode Jun 11, 2019
@tenox7 tenox7 pinned this issue Jun 11, 2019
@Justin-CB
Copy link

It would probably make the most sense to have the links redirect to plain HTTP a la SSLSTRIP. To allow scrolling, you could simply render the site at full height and just use the browser's native controlls. Also, a mode for slightly less outdated browsers would be good, and the ability to use JPEG images(I think very old versions of netscape only supported non-progressive JPEG as an image format, but I'm not sure). Also, the ability to render video for old flashes would be good.

@tenox7
Copy link
Owner Author

tenox7 commented Oct 14, 2019

Firstly, easy stuff - WRP already supports JPEG via -t flag.

Secondly, rendering a full page to a really long image is a nono really because of memory consumption and time to uncompress/decode image in an old client. I tried that before. Lots of crashes and out of memory on various systems. You can actually do it yourself, just put some large number in to H field.

When it comes to SSL in real proxy mode, https links are no longer a problem because WRP no longer operates on links. It now works as pure ISMAP with x,y clicable coordinates. So the old browser knows nothing about any links what so ever, it just sends x,y for mouse click.

Let me ask you a different question. Lets say that you are using Mosaic from 1993 which doesn't support https:// scheme at all, how would you intend to navigate to a page with such url via proxy? Provided the proxy can work just fine with such addresses, but how would you type https:// in to url bar of a browser that knows nothing about ssl.

@Justin-CB
Copy link

There's got to be a better way to do this than an image map of the entire site. It would probably be more work, but, say, loading the page @ a specified width & unlimited height, running Javascript/rendering CSS for 1 second(or another reasonably short amount of time), then auto-generating a table-based layout with regions of "rich text"(text with simple formatting) with the styles in plain HTML(using the font tag, &c.), simple images converted to non-progressive JPEG(possibly using a formula such as http://localhost:8080/image/www.example.com/example.png.jpg & http://localhost:8080/image/www.example.com/example.jpg.jpg : the proxy, upon receiving such a request, would get the image from the path(removing the appended ".jpg") & run the image through a converter to convert to non-progressive JPEG(early versions of netscape didn't support progressive JPEG's). Animations & video/audio would either show the thumbnail(which could be auto-generated) or be converted/rendered to an old version of flash.

Another option would be to use something like the "reader" mode on Amazon Silk(I don't know if other browsers support it: it might also be in mobile and/or desktop Safari or Chrome). That could potentially be modified to simplify pages for older web browsers.

Sorry for the tangent. As far as the current method is concerned, would it make sense to "chunk" the pages to a certain height? As in, render the page @ full height, then chop the image into sections of a defined height. Then either serve them all in the same page, or put a "continue" link @ the bottom of each section.

As far as https is concerned, I think the way sslstrip does it is it removes the s from all URL's starting with https:// (so "https://example.com" becomes "http://example.com"), then when the client tries to connect to a http:// site, sslstrip 1st tries to connect via https, &, if that doesn't work, it connects over plain http. After the connection is made, sslstrip sends the client's raw request to the server, then sends the server's response to the client, changing all https URL's to http.

As for your example, I've used Mosaic & pre-https versions of netscape a couple times, so I'm not sure if this behaviour is in all versions: if you type a https URL, it will just connect over plain http. Old Netscape would try to connect via http a couple times, & you'd get a redirect page which would, if you clicked the "click here" link, re-load the redirect page. Mosaic would append the https URL to the end of the path, so if you tried to connect to a https-only site with Mosaic, it would freeze for a couple seconds, then return a "path too long"-type error(the URL would be something like http://example.com/https://example.com/https://example.com... but very long).

@tenox7
Copy link
Owner Author

tenox7 commented Nov 4, 2019

At present I don't really see a need for "real" http proxy. What's a benefit of being able to set http proxy in your browser versus navigate to a gateway page? Also some technical reasons against proxy mode:

  • It's harder to setup for user, you need to go to browser settings, etc. than just type a URL.
  • For https it's just dirty hacks, I don't see a clean way of doing it universally across the board.
  • It's not future proof, dunno what url scheme will be in future with advent of http/2 http/3 and beyond. Who knows maybe even something like grpc will replace http entirely in future.
  • Most importantly "proxy mode" only works where there are links involved (a hrefs). Proxy will not work for other clickable elements that are not links.

I'm happy to discuss it further this is a very interesting topic but perhaps for just discussion let's do it offline. Please drop me an email to tenox7@gmail.com

@tenox7 tenox7 closed this as completed Nov 4, 2019
@tenox7 tenox7 unpinned this issue Jan 15, 2020
@busybox11
Copy link

Hi @tenox7, I can see at least one reason why to use a real HTTP proxy - it would be VERY convenient on my e-reader browser that does support proxys but doesn't support many web standards. Sometimes it opens pages in a popup window in which you can't control the URL but the page won't load because it uses unsupported javascript. I'd love to still be able to use this neat feature of my ereader which is unfortunately getting kind of old.

Also, generally, a more immersive experience would be appreciated - though I understand this means lots of work and would not be much more useful to that many people than your current "middleware" interface.

This project is dope though, I love it!

@tenox7
Copy link
Owner Author

tenox7 commented Aug 27, 2023

Yeah I agree. I think that perhaps proxy function can be added to WRP quite easily. My main grief with proxy is that all websites are now on HTTPS (SSL/TLS). This doesn't work through a regular HTTP proxy. It requires CONNECT method, forwarding sockets. Only to discover that you are behind on SSL version and unsupported certificate authority. Or whatever else encryption incompatibility you can thin of. The only way out of it is to strip SSL and convert all https links to http on the fly. Can this be done? Yes, but with mixed results. Especially that such conversion is considered man in the middle attack. It's much easier to render a page as is and take it's screenshot. I will keep thinking about it. The topic is not forgotten, just very difficult.

@busybox11
Copy link

Also this, yeah - though I'm not as much concerned by this as many "older" browsers since the one bundled in my E-reader is still updated quite regularly, the SSL version is up to date and everything can be accessed via HTTPS no problem.

But you're right, this is a MITM, but it indeed doesn't have the same success rate - many requests are based on the client's javascript (window.location) - matter of fact, literally all of my past employers' frontends relied on it in order to make requests to the right server.

Using a streaming screenshot based proxy would work much better, you're essentially streaming a headless chrome window that will have everything up to date and javascript working fine. "Regular" proxies that return HTML documents are less and less reliable for old browsers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants