-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[kan] Add new extractor #27959
base: master
Are you sure you want to change the base?
[kan] Add new extractor #27959
Conversation
youtube_dl/extractor/kan.py
Outdated
|
||
|
||
class KanIE(InfoExtractor): | ||
_VALID_URL = r'https?://(?:www\.)?kan\.org\.il/(?:[iI]tem/\?item[iI]d|program/\?cat[iI]d)=(?P<id>[0-9]+)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be two different extractors: for videos and for playlists.
youtube_dl/extractor/kan.py
Outdated
creator = data.get('author', {}).get('name') or \ | ||
self._og_search_property('site_name', webpage, fatal=False) | ||
thumbnail = get_thumbnail(data) | ||
m3u8_url = data.get('content', {}).get('src') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mandatory. Read coding conventions.
youtube_dl/extractor/kan.py
Outdated
video_id) | ||
title = data.get('title') or \ | ||
self._og_search_title(webpage) or \ | ||
self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never reachable.
youtube_dl/extractor/kan.py
Outdated
self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title') | ||
description = data.get('summary') or \ | ||
self._og_search_description(webpage, fatal=False) | ||
creator = data.get('author', {}).get('name') or \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_get
.
youtube_dl/extractor/kan.py
Outdated
m3u8_url = data.get('content', {}).get('src') | ||
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4') | ||
return { | ||
'_type': 'video', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is default.
youtube_dl/extractor/kan.py
Outdated
|
||
def _extract_list(self, list_id, webpage): | ||
video_ids = re.findall(r'onclick="playVideo\(.*,\'([0-9]+)\'\)', webpage) | ||
title = self._og_search_title(webpage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playlist title is optional.
youtube_dl/extractor/kan.py
Outdated
creator = try_get(data, lambda x: x['author']['name'], str) or \ | ||
self._og_search_property('site_name', webpage, fatal=False) | ||
thumbnail = get_thumbnail(data) | ||
m3u8_url = try_get(data, lambda x: x['content']['src'], str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing changed.
youtube_dl/extractor/kan.py
Outdated
if not m3u8_url: | ||
raise ExtractorError('Unable to extract m3u8 url') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
youtube_dl/extractor/kan.py
Outdated
data = self._parse_json( | ||
self._search_regex( | ||
r'<script id="kan_app_search_data" type="application/json">([^<]+)</script>', | ||
webpage, | ||
'data', | ||
), | ||
video_id, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove excessive verbosity. Read coding conventions.
youtube_dl/extractor/kan.py
Outdated
title = data.get('title') or self._og_search_title(webpage) | ||
description = data.get('summary') or \ | ||
self._og_search_description(webpage, fatal=False) | ||
creator = try_get(data, lambda x: x['author']['name'], str) or \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compat_str
youtube_dl/extractor/kan.py
Outdated
'id': video_id, | ||
'title': title, | ||
'thumbnail': thumbnail, | ||
'formats': self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be done right after title extraction.
youtube_dl/extractor/kan.py
Outdated
'description': description, | ||
'creator': creator, | ||
'release_date': unified_strdate(data.get('published')), | ||
'duration': parse_duration(data.get('extensions', {}).get('duration')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_get
youtube_dl/extractor/kan.py
Outdated
video_ids = re.findall(r'onclick="playVideo\(.*,\'([0-9]+)\'\)', webpage) | ||
entries = [] | ||
for video_id in video_ids: | ||
url = 'https://www.kan.org.il/Item/?itemId=%s' % video_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not shadow input url
..
Is there anything else I can do in order to get this extension merged? |
For a start, follow "Trailing parentheses" section from the readme. Review |
@aarubui thanks for your review. I've fixed the trailing parens, added |
Good luck with the real review. |
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
New extractor for kan.org.il.
Fixes #26551.
This is my first extractor, I tried to follow the guide the best way I can, please let me know if there are any issues and I will address them.