Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udemy requires CAPTCHAs now #19038

Closed
asdmin opened this issue Jan 28, 2019 · 5 comments
Closed

udemy requires CAPTCHAs now #19038

asdmin opened this issue Jan 28, 2019 · 5 comments
Labels

Comments

@asdmin
Copy link

@asdmin asdmin commented Jan 28, 2019

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2019.01.27. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2019.01.27

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones
  • Checked that provided video/audio/playlist URLs (if any) are alive and playable in a browser

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

$ python3 ~/opt/youtube-dl/youtube_dl/__main__.py --verbose --cookies /tmp/cookies.txt https://www.udemy.com/openstack-design-and-implement-cloud-infrastructure
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '--cookies', '/tmp/cookies.txt', 'https://www.udemy.com/openstack-design-and-implement-cloud-infrastructure']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2019.01.27
[debug] Git HEAD: e71be6ee9
[debug] Python version 3.6.6 (CPython) - Linux-4.16.0-2-amd64-x86_64-with-debian-buster-sid
[debug] exe versions: none
[debug] Proxy map: {'http': 'http://10.8.0.1:8888/', 'https': 'http://10.8.0.1:8888/'}
[udemy:course] openstack-design-and-implement-cloud-infrastructure: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 403: Unauthorized (caused by <HTTPError 403: 'Unauthorized'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/home/asd/opt/youtube-dl/youtube_dl/extractor/common.py", line 605, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/home/asd/opt/youtube-dl/youtube_dl/YoutubeDL.py", line 2215, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)


Description of your issue, suggested solution and other information

After verifying that it works in the web browser, upgrading to the latest youtube-dl, and exporting the cookies again, the video course can't be downloaded.

Checking the response, it's clear, that udemy wants me to fill in a CAPTCHA form. The response contains the following strings:

  • "Please verify you are a human"
  • "Access to this page has been denied because we believe you are using automation tools to browse the website."

As first response, I tried to change the User-Agent.

Youtube-dl uses the following User-Agent:

$ python3 ~/opt/youtube-dl/youtube_dl/__main__.py --dump-user-agent
Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0
$ 

Specifying User-Agent with --user-agent did not help: it seems that this option ineffective for udemy (the used User-Agent was not changed).

Modifying udemy.py to offer Mozilla/5.0 (X11; Linux i686; rv:64.0) Gecko/20100101 Firefox/64.0 changed the User-Agent in the request, but didn't help resolving the original issue (I am still required to fill CAPTCHAs).

@asdmin
Copy link
Author

@asdmin asdmin commented Jan 28, 2019

#18126 might be caused by the same problem as reported here

@asdmin
Copy link
Author

@asdmin asdmin commented Jan 28, 2019

I raised this ticket, because

  1. I can't comment to #18126 (because it's been restricted),
  2. I included more information than it was included in #18126.

Please update #18126 (at least make the link to this ticket) and open it for general public.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Jan 28, 2019

I can't comment to #18126 (because it's been restricted)

because users kept bumping the issue in short period of time.

I included more information than it was included in #18126.

the cause of the problem(PerimeterX Bot Defender) has been menioned in #15839 (comment).

@asdmin
Copy link
Author

@asdmin asdmin commented Jan 28, 2019

It seems they upgraded PerimeterX to catch even the first downloads. Mine failed right with the first interaction with the server.

@remitamine
Copy link
Collaborator

@remitamine remitamine commented Jan 28, 2019

probably, in such cases, it's difficult to identify what configuration that will minimize the bot detection(Cookies, Custom headers, User Agent, the time between requests...), and this problem does happen for other websites as well such as PluralSight, Yandex Music, Youtube...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.