Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stitcher] - Reorganized call to json data accessing of {config} and … #20811

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
31 changes: 27 additions & 4 deletions youtube_dl/extractor/stitcher.py
Expand Up @@ -14,6 +14,19 @@
class StitcherIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?stitcher\.com/podcast/(?:[^/]+/)+e/(?:(?P<display_id>[^/#?&]+?)-)?(?P<id>\d+)(?:[/#?&]|$)'
_TESTS = [{
'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true',
'md5': '730312cac3a909c9732747b66962eb74',
'info_dict': {
'id': '40789481',
'ext': 'mp3',
'title': 'Machine Learning Mastery and Cancer Clusters',
'show_name': 'Talking Machines',
'description': 'md5:50da9e5ec6d37867069c480edbcf8b94',
'publication_date': 'Oct 8, 2015',
'duration': 1604,
'thumbnail': r're:^https?://.*\.jpg',
},
}, {
'url': 'http://www.stitcher.com/podcast/the-talking-machines/e/40789481?autoplay=true',
'md5': '391dd4e021e6edeb7b8e68fbf2e9e940',
'info_dict': {
Expand Down Expand Up @@ -54,10 +67,16 @@ def _real_extract(self, url):

webpage = self._download_webpage(url, display_id)

episode = self._parse_json(
js_to_json(self._search_regex(
r'(?s)var\s+stitcher(?:Config)?\s*=\s*({.+?});\n', webpage, 'episode config')),
display_id)['config']['episode']
# Safe grab 'config' json data using get()
config = self._parse_json(
js_to_json(self._search_regex(r'(?s)var\s+stitcher(?:Config)?\s*=\s*({.+?});\n', webpage, 'episode config')),
display_id).get('config')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does not make any sense.


# Safe grab 'episode' json data using get()
episode = config.get('episode')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.


# Safe grab 'episode' json data using get()
feed = config.get('feed')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does not give your any safety.


title = unescapeHTML(episode['title'])
formats = [{
Expand All @@ -69,12 +88,16 @@ def _real_extract(self, url):
r'Episode Info:\s*</span>([^<]+)<', webpage, 'description', fatal=False)
duration = int_or_none(episode.get('duration'))
thumbnail = episode.get('episodeImage')
pub_date = episode.get('pubDate')
show_name = feed.get('name')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks.


return {
'id': audio_id,
'display_id': display_id,
'title': title,
'show_name': show_name,
'description': description,
'publication_date': pub_date,
'duration': duration,
'thumbnail': thumbnail,
'formats': formats,
Expand Down