Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Generic] Add support for playlists if more than one video is found #5587

Open
snipem opened this issue May 3, 2015 · 9 comments
Open

[Generic] Add support for playlists if more than one video is found #5587

snipem opened this issue May 3, 2015 · 9 comments
Labels

Comments

@snipem
Copy link
Contributor

@snipem snipem commented May 3, 2015

Treat a url as a playlist if more than one video url is found. This should be a thing for every url that is handled with the generic video extractor.

@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented May 3, 2015

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented May 7, 2015

Here is one: http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show

This page contains both youtube and ooyala videos, while youtube-dl detects the youtube video first, so the ooyala video is not downloaded at all.

@jnbdz
Copy link

@jnbdz jnbdz commented Apr 26, 2017

In the file: youtube-dl/youtube_dl/extractor/generic.py I removed some of the return in the method: _real_extract. It was then able to extract more videos from different video services. But then I ran into this error:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://tifrib.com/said-rageah/']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.04.26
[debug] Git HEAD: e8bfe2a
[debug] Python version 2.7.12 - Linux-4.4.0-72-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.11-0ubuntu0.16.04.1, ffprobe 2.8.11-0ubuntu0.16.04.1
[debug] Proxy map: {}
 --- self._real_extract
 --- Called _real_extract for embeded URLs
 --- https://tifrib.com/said-rageah/
[generic] said-rageah: Requesting header
WARNING: Falling back on generic information extractor.
[generic] said-rageah: Downloading webpage
[generic] said-rageah: Extracting information
 --- Look for embedded YouTube player
 --- Found embedded Youtube video
[u'https://videopress.com/embed/4BajuZCH', u'https://videopress.com/embed/X1is4uyi', u'https://videopress.com/embed/aJlE15aE', u'https://videopress.com/embed/SV3AWSeV']
ERROR: Unsupported URL: https://tifrib.com/said-rageah/
Traceback (most recent call last):
  File "youtube_dl/extractor/generic.py", line 1916, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "youtube_dl/compat.py", line 2526, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
  File "youtube_dl/compat.py", line 2515, in _XML
    parser.feed(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1653, in feed
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1517, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 42, column 344
Traceback (most recent call last):
  File "youtube_dl/YoutubeDL.py", line 760, in extract_info
    ie_result = ie.extract(url)
  File "youtube_dl/extractor/common.py", line 430, in extract
    ie_result = self._real_extract(url)
  File "youtube_dl/extractor/generic.py", line 2786, in _real_extract
    raise UnsupportedError(url)

I think it's because I removed too many return and Youtube-dl default to an extractor and that one did not recognize anything... So I don't think it will be hard for me to find a solution to this.

I am posting this here because I would like your feedbacks on the strategy I have chosen to resolve this issue.

@jnbdz
Copy link

@jnbdz jnbdz commented Apr 26, 2017

@yan12125 I tried your URL (http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show). I was only able to download one of the videos (the one from Ooyala). I am not sure why yet.

@jnbdz
Copy link

@jnbdz jnbdz commented Apr 26, 2017

@yan12125 I just looked at the log on my terminal... It seems it found the Youtube video but it's not downloading it for some reason.

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.04.26
[debug] Git HEAD: e8bfe2a
[debug] Python version 2.7.12 - Linux-4.4.0-72-generic-x86_64-with-Ubuntu-16.04-xenial
[debug] exe versions: ffmpeg 2.8.11-0ubuntu0.16.04.1, ffprobe 2.8.11-0ubuntu0.16.04.1
[debug] Proxy map: {}
 --- self._real_extract
 --- Called _real_extract for embeded URLs
 --- http://www.mmafighting.com/2014/2/2/5370376/ufc-169-post-fight-show
[generic] ufc-169-post-fight-show: Requesting header
WARNING: Falling back on generic information extractor.
[generic] ufc-169-post-fight-show: Downloading webpage
[generic] ufc-169-post-fight-show: Extracting information
 --- Look for embedded YouTube player
 --- Found embedded Youtube video
[]
 --- self._real_extract
[Ooyala] 5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j: Downloading JSON metadata
[Ooyala] 5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j: Downloading JSON metadata
[Ooyala] 5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j: Downloading m3u8 information
[debug] Invoking downloader on u'http://player.ooyala.com/player/all/5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j_4000.m3u8'
[download] UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4 has already been downloaded
[download] 100% of 386.05MiB
[debug] ffmpeg command line: ffprobe -show_streams 'file:UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4'
[ffmpeg] Fixing malformated aac bitstream in "UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4"
[debug] ffmpeg command line: ffmpeg -y -i 'file:UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.mp4' -c copy -f mp4 -bsf:a aac_adtstoasc 'file:UFC 169 post-fight show-5mdXVoazrZPFMEwA751Q-TJ5NH0KAz2j.temp.mp4'
@yan12125 yan12125 added the request label Apr 27, 2017
@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Apr 27, 2017

Removing returns is not enough. Need a generic approach to combine different URLs from different extractors in generic.py

@jnbdz
Copy link

@jnbdz jnbdz commented Apr 27, 2017

"combine different URLs from different extractors in generic.py" - How? I am willing to do it but I am unsure of what you mean.

@yan12125
Copy link
Collaborator

@yan12125 yan12125 commented Apr 27, 2017

For example, pages Brightcove videos yield an playlist:

            return {
                '_type': 'playlist',
                'title': video_title,
                'id': video_id,
                'entries': entries,
            }

And Wistia videos give a transparent URL:

            return {
                '_type': 'url_transparent',
                'url': embed_url,
                'ie_key': 'Wistia',
                'uploader': video_uploader,
            }

The overall result can be a playlist of them: (I'm not sure whether this approach can handle all possible cases or not)

        return {
            '_type': 'playlist',
            'entries': [{
                '_type': 'playlist',
                'title': video_title,
                'id': video_id,
                'entries': entries,
            }, {
                '_type': 'url_transparent',
                'url': embed_url,
                'ie_key': 'Wistia',
                'uploader': video_uploader,
            }]
        }
@jnbdz
Copy link

@jnbdz jnbdz commented Apr 27, 2017

Let me try it out. It might not be perfect but over time we can correct the code.

@Genome36 Genome36 mentioned this issue Jun 18, 2017
4 of 8 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.