Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Detected on Remote Linux(Ubuntu) vs Undetected On Mac #11

Closed
jckail opened this issue Jun 6, 2020 · 7 comments
Closed

Getting Detected on Remote Linux(Ubuntu) vs Undetected On Mac #11

jckail opened this issue Jun 6, 2020 · 7 comments

Comments

@jckail
Copy link

jckail commented Jun 6, 2020

I am working on a white hat side project, my intention is to scrape behind a login screen of my own data so that I can plot it :)

I am able to run the code on local env (mac os/ details below) it logins in and able to go to the desired behind login page.

However when promoted to remote linux server (ubuntu/ details below) it fails to login and is rerouted back to login page.

At first I thought it was ip/ dns registering as blacklisted but then I ran both behind a nordvpn (server: us5793) and was still getting the same result: (Works on local not on Remote)

This is the same result for local env and remote env

IP Location Chicago, Illinois (US) 
NordVPN
64.44.80.68, 198.143.57.3
Mac OS X
Chrome 83.0.4103.97
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36
1920px X 1080px
Enabled
Enabled

The expected result is that the function below returns success in finding the "mytrips" text within the html. This indicates the login was a success.

My Speculation Is one of two things,
1 the chromedriver binary responds differently to the cdc replacement you do in your code acts differently in my linux env
2 the way javascript is reinjected back into the code isn't correctly working in linux.

Other found resources:
How to inject JS and beat detection
Many Tests for bot indication
general chrome headless My code passes this for both environments

I'm going to continue hacking away at this thing and would love to help develop a solution for this and other things moving forward :) , Ideally would love to have the equivalent of the networking tab in inspect to debug these things.

Local MacOS (success) -- Login Success
sys.platform: darwin
sysname: Darwin
version: Darwin Kernel Version 19.3.0: Thu Jan  9 20:58:23 PST 2020; root:xnu-6153.81.5~1/RELEASE_X86_64
release: 19.3.0
machine: x86_64
selenium  : 3.141.0

Tried this in python3.6 & 3.8. No luck on either.

Remote Linux(fail) -- Login Fail -- Shouldn't matter with vpn, but this lives in AWS Ec2
sys.platform: linux
sysname: Linux
version: #21~18.04.1-Ubuntu SMP Mon May 11 12:33:03 UTC 2020
release: 5.3.0-1019-aws
machine: x86_64
selenium  : 3.141.0

achieved running behind nordvpn with a shell script

#!/bin/bash

echo "Executing Nord VPN"
nordvpn connect us5793


echo "Executing Python"
python3.8 /home/ubuntu/test.py

echo "Disconnecting VPN"
nordvpn disconnect

**Created a fake account for you to test on as well **

import os
import sys


print(f""" \n
sys.platform: {sys.platform}
sysname: {os.uname().sysname}
version: {os.uname().version}
release: {os.uname().release}
machine: {os.uname().machine}
\n
""")

import undetected_chromedriver as uc
uc.install() #important this is first
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

from time import sleep



class BotDriver:
    def __init__(self,username, pw, start_url, url_behind_login, headless_input = True):
        self.username = username
        self.pw = pw
        chrome_options = ChromeOptions()
        chrome_options.headless = headless_input
        chrome_options.add_argument("--incognito")
        chrome_options.add_argument('--disable-extensions')
        chrome_options.add_argument("--start-maximized")

        self.driver = Chrome(chrome_options=chrome_options)
        
        self.start_url = start_url
        self.url_behind_login = url_behind_login
        self.driver.get('https://www.iplocation.net/')
        self.driver.get_screenshot_as_file(f"iplocation.png")
        self.driver.get(start_url)
        self.waitdriver =  WebDriverWait( self.driver, 10)

    def get_element(self,findby,argument_to_click):
        element = self.waitdriver.until(EC.element_to_be_clickable((findby, argument_to_click))) 

        return element
    def slow_keys(self,input_keys,element,speed=.2):
        for character in input_keys:
            sleep(speed)
            element.send_keys(character)
        sleep(1)
    def main(self):
        element0 = self.get_element( By.LINK_TEXT,"Sign In or Join" )
        element0.click()
        element1 = self.get_element( By.XPATH,'//*[@id="user-id"]' )
        element1.click()
        self.slow_keys(self.username,element1)
        element2 = self.get_element( By.XPATH,'//*[@id="password"]' )
        element2.click()
        self.slow_keys(self.pw,element2)
        self.driver.get_screenshot_as_file(f"before_submit.png")
        element3 = self.get_element( By.XPATH,"//button[@name='submitButton']" )
        element3.click()
        self.driver.get_screenshot_as_file(f"after_submit.png")
        sleep(3)
        #test string to find
        soup = BeautifulSoup(self.driver.page_source, 'lxml')
        test = soup.body.findAll(text='My Trips')
        if len(test) > 1:
            print(f'\n\n\n Login Success ({test} len {len(test)})\n\n\n')
        else:
            print(f'\n\n\n Login failed ({test} len {len(test)})\n\n\n')
        self.driver.get(self.url_behind_login)
        self.driver.get_screenshot_as_file(f"last.png")

if __name__ == "__main__":
    username = input('Enter your login email: ')
    pw = input('Enter PW: ')
    start_url = 'https://www.marriott.com/default.mi'
    url_behind_login = 'https://www.marriott.com/loyalty/findReservationList.mi'
    pbd = BotDriver(username, pw, start_url, url_behind_login, headless_input = True)
    pbd.main()

@jckail jckail changed the title Getting Detected on Remote Linux(Ubuntu) vs Not On mac Getting Detected on Remote Linux(Ubuntu) vs Undetected On Mac Jun 6, 2020
@jckail
Copy link
Author

jckail commented Jun 8, 2020

Quick update: I am still getting this issue. To extract variables from the equation, IE Ubuntu vs Mac os and Local vs Amazon, I'm spinning up a ubuntu instance on an old mac mini to see if the code can run there.

Will update on how it goes!

@ultrafunkamsterdam
Copy link
Owner

ultrafunkamsterdam commented Jun 8, 2020

Overcomplete issues , i love em!

If i were you i would check the user agent vs a regular browser. Might be the operating system OS part is still hardcoded to windows

@ultrafunkamsterdam
Copy link
Owner

Just installed (updated) my package on my linux box and it seems fine (i always use distil since that is the tool nearly all sites use.

Type 'copyright', 'credits' or 'license' for more information
IPython 7.11.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import undetected_chromedriver as uc

In [2]: opts = uc.ChromeOptions()
Selenium patched. Safe to import Chrome / ChromeOptions

In [3]: opts.headless = True

In [4]: chrome = uc.Chrome(options=opts)
Selenium patched. Safe to import Chrome / ChromeOptions

In [5]: chrome.get('https://www.distilnetworks.com')

In [6]: chrome.save_screenshot('distil.png')
Out[6]: True

In [7]: exit

distil

@ultrafunkamsterdam
Copy link
Owner

also dont use these :

chrome_options.add_argument("--incognito")
chrome_options.add_argument('--disable-extensions')
chrome_options.add_argument("--start-maximized")

All correct settings are already done when using the ChromeOptions class from my package.
You can use the headlesss = True however.

@jckail
Copy link
Author

jckail commented Jun 9, 2020

Appreciate your help! Good news! Figured out what was happening, The way the chrome driver was being patched for linux was different than how it acts for Mac. I manually went into the binary and changed around the ''''$cdc_''' I was able to manually modify the chrome driver and was able to get passed the login! Will post solution to automate code shortly on a fork.

Will also attach the driver that solved my issue.

@ultrafunkamsterdam
Copy link
Owner

Thank you, afaik the darwin driver is being patched just fine. What could have happened.. .you had a driver in the current working dir. the module assumes if a driver is in the cwd, it has already been patched and does not retry it. However, always welcome to post your improvements.

@jckail
Copy link
Author

jckail commented Jun 19, 2020

Determined the issue was due to the version of chrome that I installed to the linux machine was the issue. Vs the one that was being downloaded. Once I passed the Correct version to the install it worked well. Appreciate your help in debugging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants