Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access Denied in Chrome on Mac #1

Open
mhendri opened this issue Apr 23, 2020 · 8 comments
Open

Access Denied in Chrome on Mac #1

mhendri opened this issue Apr 23, 2020 · 8 comments

Comments

@mhendri
Copy link

mhendri commented Apr 23, 2020

First, thank you for building this! I know its no small task.

I receive the error message below when I run the freshdirect_slot_chrome.py script on my Mac. The following happens:

  1. Chrome opens with my correct Chrome profile which has my FreshDirect credentials
  2. The error message below is shown
  3. Chrome refreshes to the FreshDirect homepage and then refreshes a few seconds later
  4. The error message below is shown every time the page is refreshed (multiple times per second)
  5. While this is happening console shows the following over and over:

Time slot table not found. Tried to sign out and sign back in at 2020-04-23 11:25:46. Sign in button not found. Trying again.
Time slot table not found. Tried to sign out and sign back in at 2020-04-23 11:25:46. Sign in button not found. Trying again.
Time slot table not found. Tried to sign out and sign back in at 2020-04-23 11:25:46. Sign in button not found. Trying again.

Screen Shot 2020-04-23 at 10 37 46 AM

To recap what I've tried:

  1. I created and updated the freshDirect_slot.ini file
  2. I've played around with the wait times in the .ini file
  3. Played around with different Chrome user profile configurations

Have you seen this issue in your development? It seems that fresh direct doesn't like the way the request is originating - the access denied shows http meanwhile all the requests are https. I appreciate any insight you can offer!

@wchao
Copy link
Owner

wchao commented Apr 23, 2020

Hi Michael,

Thanks for the praise. I built the code because I get all my food via delivery (no car, and I live in the suburbs), and it was a giant hassle that I was having to stay up late at night to snatch a time slot (mostly unsuccessfully!). I have a toddler, so staying up until 2 AM is not good because the toddler wakes up at 6:30 AM and expects attention....

Yes, I have seen that error before. When I encountered that error before, I did some Googling and it led me to some search results implying that the error occurs with DDoS protection from Cloudflare. In other words, Cloudflare thinks you are running a bot. Now, technically, I was running a bot, but it wasn't for the purpose of DDoS, so I thought that with some tweaking I might be able to overcome the problem (short story is that yes, I got it working, and it works pretty well for me now, with a few minor issues I'd like to fix, but basically very usable now for me).

What worked for me was to use my real user profile, either by pointing the Selenium Chrome webdriver to my user data directory or by making a copy of the entire user profile directory for Chrome. In my case, I copied "C:\Users\wchao\AppData\Local\Google\Chrome\User Data" to "C:\Users\wchao\Documents\Devel\Google Chrome\User Data" and then pointed my program to the copy by setting user_data_dir to "C:\Users\wchao\Documents\Devel\Google Chrome\User Data".

For you, I recommend the following steps to figure out what the best solution is, since you are on Mac:

  1. First try it just with the default user profile (not a copy). I believe that on a Mac, this is located at Users/<username>/Library/Application Support/Google/Chrome/Default. Can you confirm on your Mac?

  2. Make sure Google Chrome is not configured to run in the background and that you have shut down or terminated all running instances of Google Chrome. I found that if Chrome was running elsewhere, there were warning messages emitted by the Selenium Chrome webdriver. I wasn't sure if those warnings resulted in improper functioning, but since we are trying to isolate the problem, better to eliminate potential problems at least until we get it working for you. It would be nice to be able to interact cleanly with Chrome even if another instance is running, but I need to do further testing and research to figure out how to achieve that with the Selenium Chrome webdriver.

  3. Run the program and see if it works at this point. If not, stop the program, make sure to kill all Chrome processes and also all chromedriver processes (the chromedriver is started up by the freshdirect_slot_chrome.py program when it instantiates a Selenium Chrome webdriver instance).

  4. If the program is still giving you the error from FreshDirect and still not signing in automatically, then the next thing to try is to sign in manually. To do that, add the following right after the line that reads "if not raw_div_list:" (the if statement is right after raw_div_list = soup.find_all('div', {'id': re.compile('ts_d\d+_ts\d+_time'), 'class': 'tsCont'})):

time.sleep(60)
continue

Thus it should then read:

raw_div_list = soup.find_all('div', {'id': re.compile('ts_d\d+_ts\d+_time'), 'class': 'tsCont'})
if not raw_div_list:
time.sleep(60)
continue

Because this is Python, make sure the indentation is correct (the indents above are not right due to Github formatting). The time.sleep(60) and continue should have 6 spaces at the beginning of each line. Then run the program. What that will do is get to the point where it needs a sign-in, and then you should manually sign in. That's not going to completely solve the problem if it does work, because obviously you'd like to be able to step away from your computer and not be tethered there to do the sign in when it occasionally needs it. I find that occasionally the program needs to sign in again because FreshDirect reboots their web servers and state is lost, or some other reason.

Can you let me know if that works OK for you? If it still does not work, maybe we could schedule a time to do a screen share where I can take a look at what is happening? When the Selenium Chrome webdriver is not pointed at user data directory, what happens is it instantiates an anonymous profile, and I think Cloudflare is able to detect with certain fingerprints that the instance of Chrome is automated. When you use a user data directory that has your real data in it, I think Cloudflare believes the Chrome instance is real and not controlled by a bot.

I would at some point like to build a Chrome extension, which ought to eliminate the issues (in addition to making it way easier to install, configure, and run!).

Let me know how things work above, and then I'm happy to help as needed (could do a screen share with TeamViewer or Zoom or Skype, for instance). If you are a Python or Javascript developer, would also love to get any code contributions you might feel inclined to offer. Or, if not a developer, also happy to get feature requests and suggestions. I would love to see this get more use. I think the project will probably evolve. Eventually some of the grocery delivery companies will implement queues on their web sites where you can sign up for first available time slot rather than the less-than-ideal situation now where they have a schedule 7 days out and that's it, and the slots get grabbed minutes after they open up. At that point, the current code will be less useful, but there are probably still other things that people will find useful (e.g. super shopper that scans multiple grocery delivery sites and helps you get exactly the items you ordered).

@wchao
Copy link
Owner

wchao commented Apr 23, 2020

Also, if you need an SMTP server with authentication, let me know, and I'm happy to create some login credentials. I need to write more code to allow the general public to use my mail server for alerts (to avoid abuse), but on a one off basis for someone who is real (i.e. not a bot or a spammer or someone with malicious intent), happy to provide that if you don't already have an SMTP server.

@mhendri
Copy link
Author

mhendri commented Apr 23, 2020

Thanks for your quick response! Necessity is the mother of invention, so no wonder you spent time building this workflow. 😄

I previously tried the steps you outlined above (1-3) with no success, and tried it again after your comments.

  1. I created a new Chrome profile and re-added my FD credentials
  2. I also tried moving the directory and referencing that new directory in the user_data_dir field.

Sadly, neither of those worked. When I try to sleep the process to manually login, Chrome never autofills my credentials even though it sees my user profile. When I manually login, I get the same access denied message; maybe because the request appears "fishy." I'd be down to do a screen share to continue troubleshooting. My email is: mhendrickson91@gmail.com

I develop in Python and Javascript and would be happy to help contribute to this! Once it's working on my Mac, I can submit a PR with any changes that allow it to work/add error handling for issues like mine. It's great that you want to extend this and make it into a Chrome extension, I think that'd be really neat.

I'll let you know about SMTP credentials once I can get it to run and "see" delivery windows. Thanks very much for the offer!

@wchao
Copy link
Owner

wchao commented Apr 23, 2020

Just to confirm, on the same Mac, if you open up Chrome, not via freshdirect_slot_chrome.py, but rather just by starting the Chrome application, you are able to see FreshDirect and log in?

The symptoms you describe suggest that the user profile isn't being set properly because I think if they were, the username and password should autofill. I think the access denied message even when you manually login is caused by your user profile not getting retrieved or linked up properly in the Selenium Chrome instance.

One more thing to check: for user_data_dir, are you specifying "Users/<username>/Library/Application Support/Google/Chrome/Default" or "Users/<username>/Library/Application Support/Google/Chrome". I'm sorry that in my previous response, I think I gave you the wrong path. "Users/<username>/Library/Application Support/Google/Chrome/Default" is the path to your default user profile, but the program needs the immediate parent directory, i.e. "Users/<username>/Library/Application Support/Google/Chrome". If you change that, does that make a difference in functioning of the program?

I am on Eastern Time. Would you be able to get on a screen share around 8:30, 9, 9:30 PM Eastern Time?

@mhendri
Copy link
Author

mhendri commented Apr 23, 2020

That's correct; when I start a "normal" Chrome session I am able to a login to FreshDirect with no trouble.

I'm specifying Users/<username>/Library/Application Support/Google/Chrome for user_data_dir. I agree, I don't think the Selenium session is seeing my Chrome profile correctly, even though my user avatar appears. When I change the user_data_dir to a wrong path my avatar disappears, so it recognizes the difference on some level, but not enough to autofill the log in.

I'm also on eastern time - unfortunately I can't chat tonight, but perhaps some time tomorrow? My work day is flexible if you'd like to chat in the morning, afternoon, or early evening. Please feel free to email at the address above. Thank you again for your time, help, and effort!

@mhendri
Copy link
Author

mhendri commented Apr 27, 2020

Hey @wchao, I had two friends test this out on their Macs and they had similar issues. There is something odd about the way macOS manages Google profiles - perhaps something with the keychain. Even using selenium to send the login credentials results in an access denied page.

@wchao
Copy link
Owner

wchao commented Apr 27, 2020

Interesting. The possibility of some kind of privilege elevation being needed on Mac did occur to me. I am not really a Mac user, and have only used one maybe a few dozen times over the past couple of decades, so I'm not that familiar with how it operates, but what you say makes some sense to me, and sounds similar to AppArmor or SELinux on Linux. Is there a way to run Chrome (well, I suppose in this case, the Python script that invokes Chrome) with elevated privileges to see if that helps with this particular issue? Obviously that is a risk that would not generally be advisable, but in this case the source code is readily available to see what is being done, and it's primarily to help debug the issue. I'm not familiar enough with the keychain to know what operations are permitted and how the Chrome user data profile interacts with the keychain.

One possibility I looked into, because it also impacts Windows and probably Linux as well, was to figure out how Cloudflare and Akamai detect bots. I think it would be much better to run the code without requiring the user's real profile (more secure), but instead to use the default, anonymized profile created by Selenium and the Chrome WebDriver. The following page was somewhat useful:
https://stackoverflow.com/questions/54432980/how-to-access-a-site-via-a-headless-driver-without-being-denied-permission

However, it seems like a lot of work to handle that. Another approach is to use requests instead of Selenium:
https://stackoverflow.com/questions/44865673/access-denied-while-scraping

That has significant advantages, if it can be made to work, in that it will work on any computer because it doesn't even require a browser. On the other hand, it has certain disadvantages, notably that getting it to work may take longer than a Selenium-based approach because you can't really see what is going on. Of course, I have already done the pattern recognition and workflow bits of the project, so that's work that is already done, and the thing that would need to happen is to deal with Javascript issues that might result from not having a Javascript interpreter in the requests library. I think that the best way to make this widely usable is to write a Chrome extension (I think Javascript is needed for that) because then (a) the installation is super easy, and (b) you probably eliminate any issues with the access denied error because it's the full and real user profile of the Chrome user. I'm hoping to start looking at that in the next week or two, but I'm coming at it completely new since I haven't ever written a Chrome extension, though I've done some Javascript programming -- maybe I can get some pointers or help from you on the Chrome extension effort?

For the access denied problem overall, this page is moderately useful:
https://www.reddit.com/r/techsupport/comments/ap9wrk/definitive_research_on_the_access_denied_you_dont/

I did a Google search on "Access Denied You don't have permission to access on this server. reference # selenium chrome webdriver" and some of those links helped.

I'm a little tied up today, so I can't review it with you today, but take a look at the links I've included and see if those help, and we can connect up later this week.

One anecdote as to how crazy it continues to be. My script alerted me as follows:

FreshDirect delivery time slots opened up at 2020-04-27 12:23:42:
May 1 6 am - 9 am
May 1 2 pm - 5 pm
May 1 5 pm - 8 pm
May 1 8 pm - 10 pm

I went online as soon as I received the email (probably 15 seconds after getting it), and two of the four time slots had already been taken. I managed to snag a May 1 5 PM - 8 PM slot, but only because I already had my cart loaded and ready to go from yesterday and the day before, and all I had to do was click check out and pay (about 30 seconds). If I had to shop and fill the cart, no way I would have gotten a slot. Competition is so intense to get a slot.

@mhendri
Copy link
Author

mhendri commented Apr 30, 2020

I created a Windows virtual machine and was able to get it running there. Although, every once in a while I get the Access Denied page. It's been hard to track down why it happens to begin with, but maybe it's due to some network latency and it tries to access a page thats behind the login. I can get around this; it requires I clear the cache in the Selenium instance, quit that instance, load FreshDirect and login through a normal browser session, then restart the process.

I played around with page sequence, login sequence, cache clearing and all that quite a bit, but I haven't found a method that works 100% of the time. For example, I found that if I got an access denied request when going to the home page I could:

  1. clear the cache
  2. load FreshDirect.com
  3. click on 'Hi'
  4. be brought to the login page and login successfully

However, sometimes that button doesn't work when using Selenium and sometimes it is a pop-up instead of a redirect.

Headless browsing or a requests method would be really good. Before I came across your repo I spent about an hour trying to get requests to work and was mostly foiled.

The short of it though is that I've been able to get it working and was able to get 2 time slots! One happened to be because I was on the site so much and saw one open up, the other was thanks to your tool. So a very sincere thank you!

P.S. One thing I noticed is that if a time slot shows available but cannot be clicked on the website, it can usually be selected in the app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants