Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

Getting blocked quickly by websites. #215

Open
ksmeeks0001 opened this issue Feb 15, 2021 · 27 comments
Open

Getting blocked quickly by websites. #215

ksmeeks0001 opened this issue Feb 15, 2021 · 27 comments

Comments

@ksmeeks0001
Copy link

My proxies seem to be getting blocked by websites after switching to selenium-wire.
I am using Linux. Previously, I would use selenium with pyvirtualdisplay and add proxies through a chrome extension.
I started to use selenium-wire headless, I changed the User Agent with a request interceptor, and added a proxy to selenium-wire options.

I was very quickly blocked. I can confirm my proxy is blocked because I can no longer do a curl call to the site with it either. Is there a way for sites to detect selenium-wire using its own SSL certificate? or could this be a screen size check in JavaScript since I changed to headless?

@wkeeling
Copy link
Owner

Thanks for raising this. I'm not aware of a way for sites to detect the self-signed certificate, but I guess it's possible a mechanism may exist. But yes it's possible that some sort of digital fingerprinting is happening which checks the screen size - as you mention.

Just want to double-check a couple of other things: with changing the user-agent using a request interceptor, are you deleting the existing user-agent header first before replacing it? Otherwise you'll get two user-agent headers being sent, which may trigger the site block. Also, if you run with regular Chrome (as in non-headless mode), do you get blocked?

@ksmeeks0001
Copy link
Author

Yes, I did delete the previous user agent. I have ran with these proxies for a long time with selenium and I was blocked eventually with a set of proxies. But as I stated, it took no time at all for them to block these. It's hard to say if the switch to headless or the difference in selenium vs selenium-wire catches the request is the issue since I made both changes at the same time. It is hard to test because the site has flagged the proxy IPs completely and I can now no longer hit their site from them in any way. I would need access to multiple proxies that I can just throw away in order to test different scenarios.

@johngalt13
Copy link

@ksmeeks0001 websites can see that you use selenium-wire. I'm getting blocks via CloudFlare protection when using it. While via the extension it works fine. I've tried headless mode, without headless mode - not helped. So the only way is to use the extension. But with the extension Chrome, no idea how to intercept XHR requests while without proxies selenium-wire can intercept everything. In my case, it intercepts only Google ReCaptcha requests and nothing else. I've tried time.sleep, input() not helped.

@wkeeling
Copy link
Owner

I'm not 100% sure exactly what in Selenium Wire is causing websites to trigger anti-bot measures, however there's a new bot-detection feature in version 4.1.1 which might be worth a try if you're still having issues. It's experimental at this stage, but I'll look at refining it based on feedback.

@wkeeling
Copy link
Owner

wkeeling commented Mar 9, 2021

Some further info here. It seems that websites can in some cases detect that you are using Selenium Wire, even if you're using a browser implemented with measures to evade bot detection.

When you use Selenium Wire with capture switched on (the default) what actually happens is Selenium Wire fools the browser into thinking that it is the target website, and then performs it's own SSL handshake with the real website to retrieve the content. It does this so that it can sit in the middle and decrypt HTTPS requests and responses as they pass through. But it seems that some websites are able to see from the handshake that the client is not a browser, which triggers anti-bot measures such as throwing up captchas.

One way around this is to disable request capture in Selenium Wire using the disable_capture option as this will also disable HTTPS decryption - allowing requests to pass straight through. Useful if you only care about non-capture related functions such as proxy connectivity, but no use if you actually want to capture requests.

This is a fairly significant problem that may be touching the realms of SSL fingerprinting. I don't have a proper solution as yet, but I'll update if and when I find one. Additional info in #242

@johngalt13
Copy link

Some further info here. It seems that websites can in some cases detect that you are using Selenium Wire, even if you're using a browser implemented with measures to evade bot detection.

When you use Selenium Wire with capture switched on (the default) what actually happens is Selenium Wire fools the browser into thinking that it is the target website, and then performs it's own SSL handshake with the real website to retrieve the content. It does this so that it can sit in the middle and decrypt HTTPS requests and responses as they pass through. But it seems that some websites are able to see from the handshake that the client is not a browser, which triggers anti-bot measures such as throwing up captchas.

One way around this is to disable request capture in Selenium Wire using the disable_capture option as this will also disable HTTPS decryption - allowing requests to pass straight through. Useful if you only care about non-capture related functions such as proxy connectivity, but no use if you actually want to capture requests.

This is a fairly significant problem that may be touching the realms of SSL fingerprinting. I don't have a proper solution as yet, but I'll update if and when I find one. Additional info in #242

Hi, thanks for your reply. probably yes.
I can capture HTTPS traffic via standard chrome extension. like this - https://stackoverflow.com/questions/55582136/how-to-set-proxy-with-authentication-in-selenium-chromedriver-python
and I'm using auth proxies. I've not tested proxies without login/pwd.
Also, if you want to investigate the problem you can try to open this - http://shop.axs.com/?c=axs&e=49904939&t_locale=en-US it will display Cloudflare protection. and I've used selenium-wire options for proxies.
It working fine via my home IP. I can capture requests but not working via auth proxies.
so maybe issues with auth proxies?!

@voxvici
Copy link

voxvici commented Apr 1, 2021

Some further info here. It seems that websites can in some cases detect that you are using Selenium Wire, even if you're using a browser implemented with measures to evade bot detection.

When you use Selenium Wire with capture switched on (the default) what actually happens is Selenium Wire fools the browser into thinking that it is the target website, and then performs it's own SSL handshake with the real website to retrieve the content. It does this so that it can sit in the middle and decrypt HTTPS requests and responses as they pass through. But it seems that some websites are able to see from the handshake that the client is not a browser, which triggers anti-bot measures such as throwing up captchas.

One way around this is to disable request capture in Selenium Wire using the disable_capture option as this will also disable HTTPS decryption - allowing requests to pass straight through. Useful if you only care about non-capture related functions such as proxy connectivity, but no use if you actually want to capture requests.

This is a fairly significant problem that may be touching the realms of SSL fingerprinting. I don't have a proper solution as yet, but I'll update if and when I find one. Additional info in #242

Not sure if this is related, but maybe a possibility to add self signed certificate could help? certain proxies offer it and if client connects through proxy and uses their certificate would help? I've used it before with requests and luminati proxy

@wkeeling
Copy link
Owner

wkeeling commented Apr 2, 2021

Yes good shout. Selenium Wire disables verification of upstream self-signed certificates by default. I'll have a look at reproducing with an upstream proxy that uses a self-signed certificate, but I'll also add that certificate to the local certificate store and see whether that makes any difference.

@rnyPlanet
Copy link

@wkeeling hello again) I read the issue about mitmproxy and fingerprint, does this mean that at the moment there is no way to bypass Cloudflare?

@wkeeling
Copy link
Owner

wkeeling commented May 9, 2021

@rnyPlanet thanks for linking to that issue. Yes that looks to be the cause of this problem. I'll keep an eye on the development of that issue and see how it progresses.

@rnyPlanet
Copy link

@ultrafunkamsterdam ultrafunkamsterdam/undetected-chromedriver#154 (comment) maybe you know about proxies and can improve the library like yours and this one. and it will be a bomb

@jlplenio
Copy link

jlplenio commented Aug 21, 2021

One way around this is to disable request capture in Selenium Wire using the disable_capture option as this will also disable HTTPS decryption - allowing requests to pass straight through. Useful if you only care about non-capture related functions such as proxy connectivity, but no use if you actually want to capture requests.

@wkeeling Even with {"disable_capture":True}, I am unable to visit "https://nowsecure.nl/" and I still see
grafik. I am guessing this should disappear if capture is truely disabled.

@wkeeling
Copy link
Owner

@jlplenio the behaviour of the disable_capture option has changed since my comment was added. It no longer fully disables HTTPS decryption because otherwise upstream proxy functionality won't work.

With the "Not secure" message, have you installed Selenium Wire's root certificate in your browser? That message will disappear when the certificate is installed.

If you're on Linux, you can install the certificate on the command line with:

mkdir -p $HOME/.pki/nssdb
certutil -d sql:$HOME/.pki/nssdb -A -t TC -n "Selenium Wire" -i /path/to/ca.crt

Change /path/to/ca.crt to the path of the certificate once you've downloaded it.

@jlplenio
Copy link

Thank you, @wkeeling, for the fast and comprehensive response. The certificate worked.
The proxy functionality is what I use selenium-wire for, so I will skip undetected-chromedriver for now.

@arisolt
Copy link

arisolt commented Sep 19, 2021

Is there a way to restrict seleniumwire's behaviour so that it doesn't trigger the detection, but still be able to read the response data from a GET/POST request? Similar to how you would be able to read it in the Inspector of the browser.

@wkeeling
Copy link
Owner

@arisolt right now there is no way to do it unfortunately. Selenium Wire presents a different TLS fingerprint than a browser due to the way it uses HTTPS interception behind the scenes.

@lukehamil55
Copy link

@wkeeling still no solution for this? Using proxies with Seleniumwire triggers CloudFare

@howardjones
Copy link

howardjones commented Mar 16, 2022

For what it's worth, Squid in bump-in-the-wire mode (with a client that has the appropriate CA cert loaded) lets the proxy see the content, and doesn't trigger CloudFlare (with regular UC). You then need to convince the proxy to keep the data somehow (maybe with ICAP? or just different cache config) in a way that you can retrieve it from your testing script.

@RestOp
Copy link

RestOp commented Aug 8, 2022

No progress till now? Selenium wire and undetected chromedriver is the most powerful binding that have to work! 🍻

@wanGuk
Copy link

wanGuk commented Aug 11, 2022

I' am trying to capture the browsing traffic of several website (on 'github.io') using selenium. The returned page is same for selenium and browse normally. However, the captured traffic (packet lengths) of them is largely different. Also, the traffic packet lengths captured by selenium are relatively fixed. Dose any one has the same question?

@Osc44r
Copy link

Osc44r commented Aug 31, 2022

09.2022 seleniumwire.uc still not working. The headless mode got detected aswell by cloudflare sadly

@adirzoari
Copy link

@wkeeling I have the same issue when I enter to gcp and aws portals..
I tried with undetected chrome driver but still have the same issue..
Screen Shot 2022-10-30 at 0 52 48

@abdulzain6
Copy link

abdulzain6 commented Dec 5, 2022

Hi, i think this is the tool used to detect the TLS fingerprint https://github.com/cloudflare/mitmengine, it just compares the tls fp expected from a browser with a specific useragent to the one it recieves. I wonder if it can be solved if we can somehow know for which useragent does the TLS fp of Seleniumwire match the most with.

@ZhangPeng4242
Copy link

ZhangPeng4242 commented Feb 3, 2023

Hi there, my selenium wire is not detected under the normal mode, but when changed to the headless mode it is detected every time by datadome, any solution for this? Thanks.

@abdulzain6
Copy link

Hi there, my selenium wire is not detected under the normal mode, but when changed to the headless mode it is detected every time by datadome, any solution for this? Thanks.

Try it with undetected chromedriver , selenium wire has built in support for it

@Nehal98
Copy link

Nehal98 commented Sep 2, 2023

hi, there
is there any update into this issue, i am still getting blocked, by cloud flare while using the selenium wire while using proxies. please any solution for this.
thank you.

@abdulzain6
Copy link

hi, there
is there any update into this issue, i am still getting blocked, by cloud flare while using the selenium wire while using proxies. please any solution for this.
thank you.

Hi, this won't work using seleniumwire as the proxy changes the signature of the device. So the best way to capture traffic is to use chrome dev tools protocol using undetected chromedriver. Here are the docs https://chromedevtools.github.io/devtools-protocol/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests