-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[internazionale] Add new extractor for www.internazionale.it #14973
Conversation
'url': 'https://video.internazionale.it/%s/%s.m3u8' | ||
% (video_path, id), | ||
'ext': 'mp4', | ||
'protocol': 'm3u8', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_extract_m3u8_formats
.- At least mpd is also available.
|
||
video_container = self._html_search_regex(r'<div class="video-container" (.*)>', webpage, 'video_container') | ||
|
||
id = self._html_search_regex(r'data-job-id="([^"]+)"', video_container, 'id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not shadow built-in names.
video_id = self._match_id(url) | ||
webpage = self._download_webpage(url, video_id) | ||
|
||
video_container = self._html_search_regex(r'<div class="video-container" (.*)>', webpage, 'video_container') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing empty string does not make any sense. What's the point capturing this at all? id and path occur only once in webpage.
'info_dict': { | ||
'id': '265968', | ||
'ext': 'mp4', | ||
'description': 'Il regista statunitense Richard Linklater ci racconta una scena del film Boyhood e la sua passione per l’imprecisione della memoria. Il film è un’avventura durata 12 anni, durante la quale Linklater ha seguito il protagonista dal 2002 al 2014 per raccontare la sua crescita e il rapporto con i genitori divorziati. Leggi', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
md5:
.
'description': 'Tre ragazzi raccontano quanto è difficile essere italiani di fatto ma non di diritto: una vita fatta di burocrazia, opportunità negate e grandi contraddizioni. Leggi', | ||
'title': 'Storie di italiani senza cittadinanza', | ||
'thumbnail': r're:^https?://.*\.jpg$', | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicates.
- Use `md5:...' instead of provide a long description in info_dict and only keep one test. - Directly search for `data-job-id' and `data-video-path' attributes. - Extract m3u8 and mpd via _extract_m3u8_formats() and _extract_mpd_formats() TODO: Figure out why `python test/test_download.py TestDownload.test_Internazionale` TODO: with a DownloadError and `ERROR: requested format not available'. TODO: For m3u8 `youtube_dl -F' on a Internazionale URL indicate as extension TODO: `m3u8' instead of mp4, is this correct?
Hello Sergey,
"Sergey M." writes:
dstftw requested changes on this pull request.
1. `_extract_m3u8_formats`.
2. At least mpd is also available.
I tried to address that and respectively used _extract_m3u8_formats()
and _extract_mpd_formats().
Do not shadow built-in names.
Whoops, nice catch!
+ video_container = self._html_search_regex(r'<div class="video-container" (.*)>', webpage, 'video_container')
Capturing empty string does not make any sense. What's the point capturing this at all? id and path occur only once in webpage.
I guess that first extracting the relevant part from the entire webpage
and then extracting only interesting attributes is faster. What you
propose is right and simpler, so I've changed as you've suggested.
`md5:`.
OK.
Remove duplicates.
Did you meant remove a test? In that case I've kept only the first one.
However, two points that probably need to be addressed are (and, I have
tried to investigate further without a lot of luck):
- Why TestDownload.test_Internazionale now fails with `ERROR: requested format not available`?:
```
% python2.7 test/test_download.py TestDownload.test_Internazionale
[Internazionale] 2015/02/19/richard-linklater-racconta-una-scena-di-boyhood: Downloading webpage
[Internazionale] 2015/02/19/richard-linklater-racconta-una-scena-di-boyhood: Downloading m3u8 information
[Internazionale] 2015/02/19/richard-linklater-racconta-una-scena-di-boyhood: Downloading MPD manifest
ERROR: requested format not available
Traceback (most recent call last):
File "/home/leot/repos/youtube-dl/youtube_dl/YoutubeDL.py", line 795, in extract_info
return self.process_ie_result(ie_result, download, extra_info)
File "/home/leot/repos/youtube-dl/youtube_dl/YoutubeDL.py", line 849, in process_ie_result
return self.process_video_result(ie_result, download=download)
File "/home/leot/repos/youtube-dl/youtube_dl/YoutubeDL.py", line 1612, in process_video_result
expected=True)
ExtractorError: requested format not available
E
======================================================================
ERROR: test_Internazionale (__main__.TestDownload):
----------------------------------------------------------------------
Traceback (most recent call last):
File "test/test_download.py", line 159, in test_template
force_generic_extractor=params.get('force_generic_extractor', False))
File "/home/leot/repos/youtube-dl/youtube_dl/YoutubeDL.py", line 807, in extract_info
self.report_error(compat_str(e), e.format_traceback())
File "/home/leot/repos/youtube-dl/youtube_dl/YoutubeDL.py", line 612, in report_error
self.trouble(error_message, tb)
File "/home/leot/repos/youtube-dl/youtube_dl/YoutubeDL.py", line 582, in trouble
raise DownloadError(message, exc_info)
DownloadError: ERROR: requested format not available
…----------------------------------------------------------------------
Ran 1 test in 2.949s
FAILED (errors=1)
Exit 1
```
- Is it okay that the extension is `m3u8` when invoking `youtube-dl -F`:
```
% python2.7 -m youtube_dl -F 'https://www.internazionale.it/video/2015/02/19/richard-linklater-racconta-una-scena-di-boyhood'
[Internazionale] 2015/02/19/richard-linklater-racconta-una-scena-di-boyhood: Downloading webpage
[Internazionale] 2015/02/19/richard-linklater-racconta-una-scena-di-boyhood: Downloading m3u8 information
[Internazionale] 2015/02/19/richard-linklater-racconta-una-scena-di-boyhood: Downloading MPD manifest
[info] Available formats for 265968:
format code extension resolution note
audio-1-Audio m3u8 audio only [en]
128kbps m4a audio only DASH audio 128k , mp4a.40.2 (44100Hz)
360p_800kbps mp4 640x360 DASH video 800k , avc1.42c00d, 30fps, video only
480p_1200kbps mp4 854x480 DASH video 1200k , avc1.42c00d, 30fps, video only
1728 m3u8 640x360 1728k , avc1.64000d, video only
720p_2400kbps mp4 1280x720 DASH video 2400k , avc1.42c00d, 30fps, video only
2528 m3u8 854x480 2528k , avc1.64000d, video only
4928 m3u8 1280x720 4928k , avc1.64000d, video only (best)
```
Thank you for the review and for the attention!
|
Please follow the guide below
x
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
Add a new extractor for internazionale.it.
This was implemented analyzing the web browser requests via
mitmproxy and manually inspecting part of the
JavaScript code served.