Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Request] Site Support: Oprah.com #14710

Open
4 of 8 tasks
bishpuppy opened this issue Nov 9, 2017 · 8 comments
Open
4 of 8 tasks

[Request] Site Support: Oprah.com #14710

bishpuppy opened this issue Nov 9, 2017 · 8 comments

Comments

@bishpuppy
Copy link

bishpuppy commented Nov 9, 2017

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.11.06. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2017.11.06

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'--write-srt', u'--sub-lang', u'en', u'--ap-mso', u'nor105', u'--ap-username', u'PRIVATE', u'--ap-password', u'PRIVATE', u'http://www.oprah.com/own-queensugar/copper-sun', u'-v']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.11.06
[debug] Python version 2.7.5 - Linux-3.10.0-693.2.2.el7.x86_64-x86_64-with-centos-7.4.1708-Core
[debug] exe versions: avconv 10.1, avprobe 10.1, ffmpeg 3.4, ffprobe 3.4
[debug] Proxy map: {}
[generic] copper-sun: Requesting header
WARNING: Falling back on generic information extractor.
[generic] copper-sun: Downloading webpage
[generic] copper-sun: Extracting information
ERROR: Unsupported URL: http://www.oprah.com/own-queensugar/copper-sun
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/generic.py", line 2159, in _real_extract
    doc = compat_etree_fromstring(webpage.encode('utf-8'))
  File "/usr/lib/python2.7/site-packages/youtube_dl/compat.py", line 2539, in compat_etree_fromstring
    doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
  File "/usr/lib/python2.7/site-packages/youtube_dl/compat.py", line 2528, in _XML
    parser.feed(text)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in feed
    self._raiseerror(v)
  File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: not well-formed (invalid token): line 72, column 72
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/youtube_dl/YoutubeDL.py", line 784, in extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/common.py", line 437, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python2.7/site-packages/youtube_dl/extractor/generic.py", line 3059, in _real_extract
    raise UnsupportedError(url)
UnsupportedError: Unsupported URL: http://www.oprah.com/own-queensugar/copper-sun



If the purpose of this issue is a site support request please provide all kinds of example URLs support for which should be included (replace following example URLs by yours):

Note that youtube-dl does not support sites dedicated to copyright infringement. In order for site support request to be accepted all provided example URLs should not violate any copyrights.


@bishpuppy
Copy link
Author

I wrote a oprah.py script and I got it to extract the m3u8 but when youtube-dl goes to download it, it gets a 403 Forbidden. I tried via browser and If i am logged in then I can get the m3u8 with no problem. if I try to do --ap-mso --netrc it seems to get ignored. What am I missing?

@dstftw
Copy link
Collaborator

dstftw commented Nov 21, 2017

Correct code obviously.

@bishpuppy
Copy link
Author

well that was useful (NOT). and here I thought I would help the community

@dstftw
Copy link
Collaborator

dstftw commented Nov 21, 2017

Yours either. There are no telepathist here.

@bishpuppy
Copy link
Author

I asked for guidance on an issue, I didn't just say 'please help, someone write the code'. Whatever at this point, I'll review the docs again and look at the other .py for assistance.

@dstftw
Copy link
Collaborator

dstftw commented Nov 21, 2017

You are not mimicking browser's behavior properly - that's all guidance possible without any code. Again, there are no telepathists here, nobody knows what code have you dashed off.

@bishpuppy
Copy link
Author

Fair enough re the code. I'll preface by saying i'm not a python developer so this is new to me.

This is the code that I wrote so far

#coding : utf-8
from __future__ import unicode_literals

import re
import base64
import json

from .common import InfoExtractor
from ..utils import int_or_none

class OprahIE(InfoExtractor):
    _VALID_URL = r'https?:\/\/(?:\w+\.)?oprah\.com\/.*\/(?P<id>.*)'
    _TEST = {
        'url': 'http://oprah.com/own-queensugar/copper-sun',
        'info_dict': {
            'id': 'copper-sun',
            'ext': 'mp4',
            'title': 'Copper Sun'
        }
    }

def _real_extract(self, url):
    video_id = self._match_id(url)
    webpage = self._download_webpage(url, video_id)

    # Create an array of the html
    webpage_array = webpage.split('\n');

    # Find the line that says api_data_hash
    json_data = None
    for htmlLine in webpage_array:
        if "api_data_hash" in htmlLine and 'JSON' not in htmlLine:
            htmlLine = htmlLine.strip()
            api_data = htmlLine.split('"')[1]
            json_data = json.loads(base64.b64decode(api_data))

    # Proceed if json_data has ... data :)
    if json_data is not None:
        clip_data = json_data['clip_data']
        url =  clip_data['contentPath']
        title = clip_data['title']
        description = clip_data['description']

        print("[Oprah] Video ID: %s" % video_id)
        print("[Oprah] Video Title: %s" % title)

        return {
            'id': video_id,
            'title': title,
            'url': url,
            'description': description,
            'ext': 'mp4'
        }
    else:
        print("[Oprah] Error extracting the information from the page")

@sjeble
Copy link

sjeble commented Aug 17, 2019

Any update on this? I'd be interesting in it being added as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants