Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNBC does not support HEAD requests #14193

Closed
rredford6 opened this issue Sep 13, 2017 · 12 comments
Closed

CNBC does not support HEAD requests #14193

rredford6 opened this issue Sep 13, 2017 · 12 comments

Comments

@rredford6
Copy link

@rredford6 rredford6 commented Sep 13, 2017

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.09.11. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2017.09.11

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

Possibly related: #13222

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858', '--verbose', '--ap-mso', 'DTV', '--ap-username', 'PRIVATE', '--user-agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36', '--hls-prefer-native']
Type TV provider account password and press [Return]:
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2017.09.11
[debug] Python version 3.4.4 - Windows-10-10.0.14393
[debug] exe versions: ffmpeg N-82324-g872b358, ffprobe N-82324-g872b358
[debug] Proxy map: {}
[generic] 1015334979858: Requesting header
WARNING: Could not send HEAD request to http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858: HTTP Error 404: Not Found
[generic] 1015334979858: Downloading webpage
WARNING: Falling back on generic information extractor.
[generic] 1015334979858: Extracting information
ERROR: Unsupported URL: http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\YoutubeDL.py", line 776, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\extractor\common.py", line 434, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\rg3\tmp24lzs8ny\build\youtube_dl\extractor\generic.py", line 2964, in _real_extract
youtube_dl.utils.UnsupportedError: Unsupported URL: http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858

Culprit is the HEAD request:

wget --spider http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858 -U "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"

Spider mode enabled. Check if remote file exists.
--2017-09-13 00:41:17--  http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858
Resolving www.cnbc.com (www.cnbc.com)... 104.126.139.198
Connecting to www.cnbc.com (www.cnbc.com)|104.126.139.198|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
Remote file does not exist -- broken link!!!

wget --delete-after http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858 -U "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"

--2017-09-13 00:50:15--  http://www.cnbc.com/live-tv/the-profit/full-episode/top-10-rules-for-success/1015334979858
Resolving www.cnbc.com (www.cnbc.com)... 23.222.154.54
Connecting to www.cnbc.com (www.cnbc.com)|23.222.154.54|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: '1015334979858'

1015334979858                                         [ <=>                                                                                                       ] 129.75K  --.-KB/s    in 0.01s

2017-09-13 00:50:15 (9.06 MB/s) - '1015334979858' saved [132869]

Removing 1015334979858.
@rredford6 rredford6 changed the title CNBC no longer works CNBC does not support HEAD requests Sep 13, 2017
@fenollp fenollp mentioned this issue Jun 5, 2018
4 of 8 tasks complete
@keithah
Copy link

@keithah keithah commented Aug 24, 2018

bump, any ideas what could fix this?

@cookieguru
Copy link

@cookieguru cookieguru commented Aug 24, 2018

@keithah This line needs to be changed

@dstftw dstftw closed this in ffa7b2b Oct 29, 2018
@bipple294
Copy link

@bipple294 bipple294 commented Dec 5, 2018

any update on this? i am still getting the same error with CNBC videos.

@keithah
Copy link

@keithah keithah commented Dec 9, 2018

@cookieguru Sorry for the late response, but what do I change that line too?

@cookieguru
Copy link

@cookieguru cookieguru commented Dec 9, 2018

@keithah If you're looking for a quick and dirty fix then change HEAD to GET here.

@thezoggy
Copy link

@thezoggy thezoggy commented Dec 10, 2018

looks like you could just use this pull request to achieve what you want as well:
#18086

@keithah
Copy link

@keithah keithah commented Dec 10, 2018

Unfortunately, still fails:
https://gist.github.com/keithah/f249d0282f74be1415db2ef3444a9309

So it's not just the HEAD requests.

@cookieguru
Copy link

@cookieguru cookieguru commented Dec 10, 2018

@keithah Look at line 3. It's not using the CNBC extractor because said extractor doesn't have support for that URL pattern. Updating the regex on line 41 should do it.

@keithah
Copy link

@keithah keithah commented Dec 10, 2018

some progress!
https://gist.github.com/keithah/3788844981c5586466807b80e72d807a

Now failing at the info extractor..

@cookieguru
Copy link

@cookieguru cookieguru commented Dec 10, 2018

Change line 62 to this:

            r'data-mpx-id=["\'](\d+)', webpage, display_id,

Although to make this worthy of a PR it should actually be a new class

@keithah
Copy link

@keithah keithah commented Dec 11, 2018

@cookieguru Still no luck, now getting a 404 on a totally valid link, but farther!
https://gist.github.com/keithah/72930668ef9793b1cca6c7baab6037da

Tried a video which shouldn't require auth too:
https://www.irccloud.com/pastebin/ImKzOoKs/

I tried to email you too, in case you wanted to take this off GitHub.

@cookieguru
Copy link

@cookieguru cookieguru commented Dec 11, 2018

I have no idea what SMIL data is or where to find it; that's probably the data that contains the path to the m3u8 file.

Evidently this new live-tv URL structure isn't just for URLs but also contains a new page structure and/or video-loading mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.