Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FoxNewsIE] Returned subtitle url is just the path, not full url #12797

Closed
AlJohri opened this issue Apr 20, 2017 · 4 comments
Closed

[FoxNewsIE] Returned subtitle url is just the path, not full url #12797

AlJohri opened this issue Apr 20, 2017 · 4 comments

Comments

@AlJohri
Copy link

@AlJohri AlJohri commented Apr 20, 2017

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like that [x])
  • Use Preview tab to see how your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.04.17. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2017.04.17

Before submitting an issue make sure you have:

  • At least skimmed through README and most notably FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

$ youtube-dl --verbose --skip-download --write-sub "http://video.foxnews.com/v/2086309615001/"
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '--skip-download', '--write-sub', 'http://video.foxnews.com/v/2086309615001/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.04.17
[debug] Git HEAD: ea0c2f219
[debug] Python version 3.6.0 - Darwin-16.5.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.3, ffprobe 3.3
[debug] Proxy map: {}
[foxnews] Downloading Akamai AMP feed
[foxnews] 2086309615001: Downloading f4m manifest
[foxnews] 2086309615001: Downloading m3u8 information
{'en-us': [{'url': '011013_factor_beck.asf_dfxp.xml', 'ext': 'xml'}]}
Traceback (most recent call last):
  File "/Users/johria/.pyenv/versions/3.6.0/bin/youtube-dl", line 11, in <module>
    load_entry_point('youtube-dl', 'console_scripts', 'youtube-dl')()
  File "/Users/johria/Development/youtube-dl/youtube_dl/__init__.py", line 464, in main
    _real_main(argv)
  File "/Users/johria/Development/youtube-dl/youtube_dl/__init__.py", line 454, in _real_main
    retcode = ydl.download(all_urls)
  File "/Users/johria/Development/youtube-dl/youtube_dl/YoutubeDL.py", line 1897, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/Users/johria/Development/youtube-dl/youtube_dl/YoutubeDL.py", line 771, in extract_info
    return self.process_ie_result(ie_result, download, extra_info)
  File "/Users/johria/Development/youtube-dl/youtube_dl/YoutubeDL.py", line 825, in process_ie_result
    return self.process_video_result(ie_result, download=download)
  File "/Users/johria/Development/youtube-dl/youtube_dl/YoutubeDL.py", line 1540, in process_video_result
    self.process_info(new_info)
  File "/Users/johria/Development/youtube-dl/youtube_dl/YoutubeDL.py", line 1705, in process_info
    sub_info['url'], info_dict['id'], note=False)
  File "/Users/johria/Development/youtube-dl/youtube_dl/extractor/common.py", line 630, in _download_webpage
    res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal, encoding=encoding, data=data, headers=headers, query=query)
  File "/Users/johria/Development/youtube-dl/youtube_dl/extractor/common.py", line 527, in _download_webpage_handle
    urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query)
  File "/Users/johria/Development/youtube-dl/youtube_dl/extractor/common.py", line 498, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/Users/johria/Development/youtube-dl/youtube_dl/YoutubeDL.py", line 2106, in urlopen
    req = sanitized_Request(req)
  File "/Users/johria/Development/youtube-dl/youtube_dl/utils.py", line 540, in sanitized_Request
    return compat_urllib_request.Request(sanitize_url(url), *args, **kwargs)
  File "/Users/johria/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 329, in __init__
    self.full_url = url
  File "/Users/johria/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 355, in full_url
    self._parse()
  File "/Users/johria/.pyenv/versions/3.6.0/lib/python3.6/urllib/request.py", line 384, in _parse
    raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: '011013_factor_beck.asf_dfxp.xml'

Description of your issue, suggested solution and other information

The extracted subtitle url is coming out to be 011013_factor_beck.asf_dfxp.xml instead of the full qualified url with a protocol and path.

From looking at other links, the url should have been: http://media2.foxnews.com/011013/011013_factor_beck.asf_dfxp.xml

@cqnkxy
Copy link

@cqnkxy cqnkxy commented Apr 23, 2017

I checked for more recent videos and the path problem has been fixed(they provide the full url path for subtitles when you request the json feed). It seems like fox has changed the rule for their subtitle urls. I can't even turn the caption on with the url you provided on the fox video website.

@AlJohri
Copy link
Author

@AlJohri AlJohri commented Apr 23, 2017

Cool, should we close this for now or make a rule based on if http is in the URL / date if published before X?

Looking through all of Oreilly's clips, this was the only clip that had this error so it's not a big issue.

@cqnkxy
Copy link

@cqnkxy cqnkxy commented Apr 23, 2017

I don't know for sure, but I would guess the subtitle would not be a problem for most of the videos. I was probing the url for a little while using get request like
http://www.foxnews.com/search-results/search?q=usa&ss=fn&min_date=2012-05-02&max_date=2013-01-01&start=0 to fetch video within a certain time range. From what I've observed a lot of videos from before the year 2013 don't have captions. And this one from 2013-01-11 (your url is from 2013-01-10) also doesn't have a caption. Another video from 2013-01-20 has the right subtitle path. So date doesn't seem to matter too much here.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Apr 28, 2017

Yep that seems a server-side issue. From Javascripts on that page I didn't see any URL manipulating functions. That is, subtitle URLs are passed as-is.

From looking at other links, the url should have been: http://media2.foxnews.com/011013/011013_factor_beck.asf_dfxp.xml

If you know how to fix subtitle URLs, you can do it right away. For youtube-dl, heuristics should be avoided.

@yan12125 yan12125 closed this Apr 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.