New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nest Add new extractor #31274
base: master
Are you sure you want to change the base?
Nest Add new extractor #31274
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work!
It's nearly there. Have a look at the suggestions and get the test working.
r'https:\/\/video.nest.com\/clip\/(.+?)(\.|")', webpage, 'video_id', fatal=False) | ||
title = self._html_search_meta(['og:title', 'title'], webpage, 'title') | ||
if title == "": | ||
title = "\"\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this extractor the page may have no explicit title, but yt-dl wants one, so use a specialised standard method to invent one (as above):
title = "\"\"" | |
title = self._generic_title(url) |
'description': '#caughtonNestCam', | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _generic_title(self, url) | |
return 'NestCam video ' + super(NestIE, self)._generic_title(url) | |
video_id = self._search_regex( | ||
r'https:\/\/video.nest.com\/clip\/(.+?)(\.|")', webpage, 'video_id', fatal=False) | ||
title = self._html_search_meta(['og:title', 'title'], webpage, 'title') | ||
if title == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if title == "": | |
if not title: |
webpage = self._download_webpage(url, video_id) | ||
video_id = self._search_regex( | ||
r'https:\/\/video.nest.com\/clip\/(.+?)(\.|")', webpage, 'video_id', fatal=False) | ||
title = self._html_search_meta(['og:title', 'title'], webpage, 'title') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer tuple for const sequence:
title = self._html_search_meta(['og:title', 'title'], webpage, 'title') | |
title = self._html_search_meta(('og:title', 'title'), webpage, 'title') |
if "/" in ext: | ||
ext = ext[ext.index("/") + 1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use utils.mimetype2ext()
:
if "/" in ext: | |
ext = ext[ext.index("/") + 1:] | |
ext = mimetype2ext(ext) or ext |
# coding: utf-8 | ||
from __future__ import unicode_literals | ||
|
||
from .common import InfoExtractor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used later:
from .common import InfoExtractor | |
from .common import InfoExtractor | |
from ..utils import ( | |
ExtractorError, | |
mimetype2ext, | |
url_or_none, | |
) |
|
||
|
||
class NestIE(InfoExtractor): | ||
_VALID_URL = r'https?://(?:www\.)?video.nest\.com/clip/(?P<id>)(.mp4)?' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never match a useful ID!
_VALID_URL = r'https?://(?:www\.)?video.nest\.com/clip/(?P<id>)(.mp4)?' | |
_VALID_URL = r'https?://(?:www\.)?video\.nest\.com/clip/(?P<id>\w+)' |
video_id = self._search_regex( | ||
r'https:\/\/video.nest.com\/clip\/(.+?)(\.|")', webpage, 'video_id', fatal=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to escape /
here (in a JS /regexp/, yes), but do escape .
, and don't overwrite video_id:
video_id = self._search_regex( | |
r'https:\/\/video.nest.com\/clip\/(.+?)(\.|")', webpage, 'video_id', fatal=False) | |
video_id = self._search_regex( | |
r'https://video\.nest\.com/clip/(.+?)(?:\.|")', webpage, 'video_id', fatal=False) or video_id |
Actually, is this ever different from the value extracted from the page URL? With the correct _VALID_URL
, you should have a good value for it. If you do need to do this search, use the _VALID_URL
again:
video_id = self._search_regex( | |
r'https:\/\/video.nest.com\/clip\/(.+?)(\.|")', webpage, 'video_id', fatal=False) | |
video_id = self._search_regex( | |
self._VALID_URL, webpage, 'video_id', group='id', fatal=False) or video_id |
Or just
video_id = self._search_regex( | |
r'https:\/\/video.nest.com\/clip\/(.+?)(\.|")', webpage, 'video_id', fatal=False) |
_TEST = { | ||
'url': 'https://video.nest.com/clip/73ddb6bd57c4485597a76e154a4429ea.mp4', | ||
'md5': '7ab4eb6d4c2480be1740cc014a76ee96', | ||
'info_dict': { | ||
'id': '73ddb6bd57c4485597a76e154a4429ea', | ||
'ext': 'mp4', | ||
'title': "\"\"", | ||
'description': '#caughtonNestCam', | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer _TESTS_
in new extractors:
_TEST = { | |
'url': 'https://video.nest.com/clip/73ddb6bd57c4485597a76e154a4429ea.mp4', | |
'md5': '7ab4eb6d4c2480be1740cc014a76ee96', | |
'info_dict': { | |
'id': '73ddb6bd57c4485597a76e154a4429ea', | |
'ext': 'mp4', | |
'title': "\"\"", | |
'description': '#caughtonNestCam', | |
} | |
} | |
_TESTS = [{ | |
'url': 'https://video.nest.com/clip/73ddb6bd57c4485597a76e154a4429ea.mp4', | |
'md5': '7ab4eb6d4c2480be1740cc014a76ee96', | |
'info_dict': { | |
'id': '73ddb6bd57c4485597a76e154a4429ea', | |
'ext': 'mp4', | |
'title': "\"\"", | |
'description': '#caughtonNestCam', | |
} | |
}] |
'info_dict': { | ||
'id': '73ddb6bd57c4485597a76e154a4429ea', | ||
'ext': 'mp4', | ||
'title': "\"\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match other changes:
'title': "\"\"", | |
'title': r're:^NestCam video \w+', |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Add extractor for NestCam video.