Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detected by Distill Network #6

Closed
FocuSPower opened this issue May 24, 2020 · 5 comments
Closed

Detected by Distill Network #6

FocuSPower opened this issue May 24, 2020 · 5 comments
Labels
update-documentation Todo: update documentation

Comments

@FocuSPower
Copy link

I was trying to scrap g2.com. First time i went to the website it works fine, but the second time i return it detects suspicious activity. The message it pops is:

Pardon Our Interruption...
As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:

You're a power user moving through this website with super-human speed.
You've disabled JavaScript and/or cookies in your web browser.
A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article.
After completing the CAPTCHA below, you will immediately regain access to the site again

Brief:
The fact is that i did not moved fast at all, just tried to acces a page. First time it worked, second time i got caught.

@FocuSPower FocuSPower changed the title Detected Detected by Distill Network May 24, 2020
@ultrafunkamsterdam
Copy link
Owner

Please provide OS, chrome version, and your code.

@FocuSPower
Copy link
Author

OS: Windows10
ChromeVersion: 81.0.4044.138
Sorry, im new in web-scraping, what kinda code ?

@chemeng
Copy link

chemeng commented May 24, 2020

I think they can identify you from your browsing behavior. Maybe they can track the mouse speed or something similar and identify you AFTER the first page has been loaded and as you move towards a second page.

@ultrafunkamsterdam
Copy link
Owner

ultrafunkamsterdam commented May 24, 2020

Sorry, im new in web-scraping, what kinda code ?

Given this reply (on the biggest coding-related platform) , makes me think you might have just used your mouse and keyboard to navigate to g2.com . Doing so WILL make you detectable even with a patched driver (i just reproduced). Once detected, subsequent requests in a short time from the same ip will trigger the captcha again. So just solve the captcha, stay away for a while (or/and use your regular browser and navigate through the site).

What also might help is to create a dedicated Chrome profile using your regular chrome session, make a note of the path to the folder, log into your google acccount, then quit chrome, specify that profile path explicitly in ChromeOptions object which you can pass to Chrome(options=options_obj).

Anyway, tested:

import undetected_chromedriver as uc
driver = uc.Chrome()
driver.get('https://g2.com')
driver.find_element_by_id('query').send_keys('Coding')   # ;)
driver.find_element_by_css_selector('button[type=submit] div').click()
driver.close()

Not detected (even when not trying to act human-like). However after i opened a new tab using my mouse and navigated to the site, i was detected as described above.

@ultrafunkamsterdam ultrafunkamsterdam added the update-documentation Todo: update documentation label May 24, 2020
@FocuSPower
Copy link
Author

FocuSPower commented May 25, 2020

Now i realised you we're talking about the script xD. Sorry, my bad

driver.get('https://www.g2.com/products/'+site[:-4]+'/reviews')
number1_g2 = driver.find_element_by_class_name('link.js-log-click')
number2_g2 = number1_g2[1].text
number_g2 = number2_g2[:-8]
rate1_g2 = driver.find_element_by_class_name('fw-semibold')
rate_g2=rate1_g2[2].text

The fact is my program will open and close tabs on multiple occasions, but just to extract information, not to click.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
update-documentation Todo: update documentation
Projects
None yet
Development

No branches or pull requests

3 participants