[la7] Improvements to the extractor #1575

nixxo · 2021-11-06T19:52:17Z

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

Bug fix
Improvement

Description of your pull request and other information

no longer using kaltura extactor
added hls, dash formats
fixes [Broken] la7 gives HTTP 403 #1065 La7 italian television support does not work anymore ytdl-org/youtube-dl#23323 Italian la7 broken again ytdl-org/youtube-dl#28963

BEFORE:

λ ydl https://www.la7.it/propagandalive/rivedila7/propaganda-live-puntata-del-29102021-30-10-2021-405578 -F
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading webpage
[Kaltura] 0_a5118mri: Downloading video info JSON
[Kaltura] 0_a5118mri: Checking mp4-1344 URL
[Kaltura] 0_a5118mri: mp4-1344 URL is invalid, skipping: HTTP Error 403: Forbidden
[info] Available formats for 0_a5118mri:
ID      EXT RESOLUTION FPS |  FILESIZE    TBR PROTO | VCODEC   VBR ACODEC   ABR MORE INFO
------- --- ---------- --- - ---------- ----- ----- - ------ ----- ------- ---- ---------
mp4-669 mp4 640x360    25  | ~928.00MiB  669k http  | avc1    669k unknown   0k isom

AFTER:

λ ydpm https://www.la7.it/propagandalive/rivedila7/propaganda-live-puntata-del-29102021-30-10-2021-405578 -F
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading webpage
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading m3u8 information
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading MPD manifest
[la7.it] entry/data/0/485/0_a5118mri_0_w5zq3y5r_1: Check filesize
[la7.it] entry/data/0/485/0_a5118mri_0_vd90m39b_1: Check filesize
[info] Available formats for propaganda-live-puntata-del-29102021-30-10-2021-405578:
ID            EXT RESOLUTION FPS |  FILESIZE    TBR PROTO  | VCODEC        VBR ACODEC     ABR  ASR    MORE INFO
---------------------------------------------------------------------------------------------------------------------------
dash-f1-a1-x3 m4a audio only     |              63k dash   |                   mp4a.40.2  63k 44100Hz DASH audio, m4a_dash
dash-f2-a1-x3 m4a audio only     |             128k dash   |                   mp4a.40.2 128k 48000Hz DASH audio, m4a_dash
dash-f1-v1-x3 mp4 640x360    25  |             599k dash   | avc1.42c01e  599k                        DASH video, mp4_dash
hls-663       mp4 640x360    25  |             663k m3u8_n | avc1.42c01e  663k mp4a.40.2   0k
https-663     mp4 640x360    25  | ~928.20MiB  663k https  | avc1.42c01e  663k mp4a.40.2   0k
dash-f2-v1-x3 mp4 1280x720   25  |            1214k dash   | avc1.64001f 1214k                        DASH video, mp4_dash
hls-1342      mp4 1280x720   25  |            1342k m3u8_n | avc1.64001f 1342k mp4a.40.2   0k
https-1342    mp4 1280x720   25  | ~1.82GiB   1342k https  | avc1.64001f 1342k mp4a.40.2   0k

- no longer using kaltura extactor - added hls, dash formats - fixes yt-dlp#1065

pukkandan · 2021-11-07T17:00:55Z

yt_dlp/extractor/la7.py

+                urlh = self._request_webpage(
+                    HEADRequest(http_url), quality,
+                    note='Check filesize', fatal=False
+                )
+                if urlh:
+                    http_f = f.copy()
+                    del http_f['manifest_url']
+                    http_f.update({
+                        'format_id': http_f['format_id'].replace('hls-', 'https-'),
+                        'url': http_url,
+                        'protocol': 'https',
+                        'filesize_approx': int_or_none(urlh.headers.get('Content-Length', None)),
+                    })


I dont know if it is really usefull to make a request just for getting the filesize. No other extractor does this

yeah, I know, but it's maximum 2 request and since some tv programme are quite long (3 hours+) the file size can be quite big: ~2GB, so to give this info to the user can be useful and, as you can see from the BEFORE output, it was something that the kaltura extractor was providing to the user.

I am still against it, but I wont force you to remove it either. I don't use the extractor and if u think this is worth it, ig u can keep it. But if more extractors start doing this in future, I'll have to design some strict guidelines on when this is allowed and when it isnt

ok, then let's hope nobody uses it ;-D

what if I make it optional?

I'll create a InfoExtractor method _generate_filesize and add on option --generate-filesize so that the extractors developer can use the method and the user has the control to allow it or not.

in the extractor:
'filesize_approx': self._generate_filesize(url)

in the InfoExtractor method:
if self.params.get('generate-filesize'):

You can see my implementation in the last 2 commits

if such a feature is to be implemented, it will need to be done in general and not specific to extractors. I am reverting your last 2 commits for this PR

There is also a feature request to fetch more format details using ffprobe. Any implementation of this will also need to atleast leave room for that

yt_dlp/extractor/la7.py

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

…tractor

This reverts commit 329b800.

…o InfoExtractor" This reverts commit 7577ec6.

* commit '9ebf3c6ab97c29b2d5872122e532bc98b93ad8b3': (23 commits) [version] update Release 2021.11.10.1 [version] update Release 2021.11.10 [tvp] Add TVPStreamIE (yt-dlp#1401) Authored by: selfisekai [tvp] Fix extractor (yt-dlp#1401) Authored by: selfisekai [tvp] Fix embeds (yt-dlp#1401) Authored by: selfisekai [wppilot] Add extractors (yt-dlp#1401) Authored by: selfisekai [radiokapital] Add extractors (yt-dlp#1401) Authored by: selfisekai [polsatgo] Add extractor (yt-dlp#1386) Authored by: selfisekai, sdomi [polskieradio] Add extractors (yt-dlp#1386) Authored by: selfisekai [extractor] Add `_search_nextjs_data` (yt-dlp#1386) Authored by: selfisekai [cleanup] minor fixes [docs] Minor documentation improvements Closes yt-dlp#1583, yt-dlp#1599 [outtmpl] Add alternate forms for `q` and `j` [cleanup] Minor improvements to error and debug messages fix for e1b7c54 [Gab] Add extractor (yt-dlp#1505) [imdb] Fix thumbnail (yt-dlp#1581) [la7] Fix extractor (yt-dlp#1575) ...

nixxo added 2 commits November 6, 2021 20:39

[la7] Improvements to the extractor

34a301c

- no longer using kaltura extactor - added hls, dash formats - fixes yt-dlp#1065

[la7] Added filesize_approx

e4104db

nixxo force-pushed the la7-fixes branch 3 times, most recently from b4b9f61 to 26859bb Compare November 7, 2021 16:59

pukkandan reviewed Nov 7, 2021

View reviewed changes

[la7] improved filesize_approx extraction from Content-Length

25310ca

nixxo force-pushed the la7-fixes branch from 26859bb to 25310ca Compare November 7, 2021 17:02

nixxo marked this pull request as ready for review November 7, 2021 17:10

pukkandan reviewed Nov 8, 2021

View reviewed changes

yt_dlp/extractor/la7.py Outdated Show resolved Hide resolved

Update yt_dlp/extractor/la7.py

e7a332b

Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>

nixxo marked this pull request as draft November 8, 2021 19:04

[la7] code simplification

0c2c5bc

nixxo marked this pull request as ready for review November 8, 2021 19:35

nixxo added 2 commits November 9, 2021 14:21

[common, options] added generate_filesize option and method to InfoEx…

7577ec6

…tractor

[la7] using genarate_filesize method

329b800

nixxo force-pushed the la7-fixes branch from 54ef99f to 329b800 Compare November 9, 2021 13:26

pukkandan added 3 commits November 10, 2021 02:20

Revert "[la7] using genarate_filesize method"

87f17b7

This reverts commit 329b800.

Revert "[common, options] added generate_filesize option and method t…

848ab0f

…o InfoExtractor" This reverts commit 7577ec6.

cleanup

53c5e97

pukkandan force-pushed the la7-fixes branch from 295730d to 53c5e97 Compare November 9, 2021 21:05

pukkandan merged commit 9b12e9a into yt-dlp:master Nov 9, 2021

nixxo deleted the la7-fixes branch October 20, 2022 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[la7] Improvements to the extractor #1575

[la7] Improvements to the extractor #1575

nixxo commented Nov 6, 2021 •

edited

pukkandan Nov 7, 2021

nixxo Nov 7, 2021 •

edited

pukkandan Nov 8, 2021

nixxo Nov 8, 2021 •

edited

nixxo Nov 9, 2021 •

edited

nixxo Nov 9, 2021

pukkandan Nov 9, 2021

[la7] Improvements to the extractor #1575

[la7] Improvements to the extractor #1575

Conversation

nixxo commented Nov 6, 2021 • edited

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

pukkandan Nov 7, 2021

Choose a reason for hiding this comment

nixxo Nov 7, 2021 • edited

Choose a reason for hiding this comment

pukkandan Nov 8, 2021

Choose a reason for hiding this comment

nixxo Nov 8, 2021 • edited

Choose a reason for hiding this comment

nixxo Nov 9, 2021 • edited

Choose a reason for hiding this comment

nixxo Nov 9, 2021

Choose a reason for hiding this comment

pukkandan Nov 9, 2021

Choose a reason for hiding this comment

nixxo commented Nov 6, 2021 •

edited

nixxo Nov 7, 2021 •

edited

nixxo Nov 8, 2021 •

edited

nixxo Nov 9, 2021 •

edited