Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[la7] Improvements to the extractor #1575

Merged
merged 10 commits into from Nov 9, 2021
Merged

Conversation

nixxo
Copy link
Contributor

@nixxo nixxo commented Nov 6, 2021

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense

What is the purpose of your pull request?

  • Bug fix
  • Improvement

Description of your pull request and other information

BEFORE:

λ ydl https://www.la7.it/propagandalive/rivedila7/propaganda-live-puntata-del-29102021-30-10-2021-405578 -F
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading webpage
[Kaltura] 0_a5118mri: Downloading video info JSON
[Kaltura] 0_a5118mri: Checking mp4-1344 URL
[Kaltura] 0_a5118mri: mp4-1344 URL is invalid, skipping: HTTP Error 403: Forbidden
[info] Available formats for 0_a5118mri:
ID      EXT RESOLUTION FPS |  FILESIZE    TBR PROTO | VCODEC   VBR ACODEC   ABR MORE INFO
------- --- ---------- --- - ---------- ----- ----- - ------ ----- ------- ---- ---------
mp4-669 mp4 640x360    25  | ~928.00MiB  669k http  | avc1    669k unknown   0k isom

AFTER:

λ ydpm https://www.la7.it/propagandalive/rivedila7/propaganda-live-puntata-del-29102021-30-10-2021-405578 -F
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading webpage
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading m3u8 information
[la7.it] propaganda-live-puntata-del-29102021-30-10-2021-405578: Downloading MPD manifest
[la7.it] entry/data/0/485/0_a5118mri_0_w5zq3y5r_1: Check filesize
[la7.it] entry/data/0/485/0_a5118mri_0_vd90m39b_1: Check filesize
[info] Available formats for propaganda-live-puntata-del-29102021-30-10-2021-405578:
ID            EXT RESOLUTION FPS |  FILESIZE    TBR PROTO  | VCODEC        VBR ACODEC     ABR  ASR    MORE INFO
---------------------------------------------------------------------------------------------------------------------------
dash-f1-a1-x3 m4a audio only     |              63k dash   |                   mp4a.40.2  63k 44100Hz DASH audio, m4a_dash
dash-f2-a1-x3 m4a audio only     |             128k dash   |                   mp4a.40.2 128k 48000Hz DASH audio, m4a_dash
dash-f1-v1-x3 mp4 640x360    25  |             599k dash   | avc1.42c01e  599k                        DASH video, mp4_dash
hls-663       mp4 640x360    25  |             663k m3u8_n | avc1.42c01e  663k mp4a.40.2   0k
https-663     mp4 640x360    25  | ~928.20MiB  663k https  | avc1.42c01e  663k mp4a.40.2   0k
dash-f2-v1-x3 mp4 1280x720   25  |            1214k dash   | avc1.64001f 1214k                        DASH video, mp4_dash
hls-1342      mp4 1280x720   25  |            1342k m3u8_n | avc1.64001f 1342k mp4a.40.2   0k
https-1342    mp4 1280x720   25  | ~1.82GiB   1342k https  | avc1.64001f 1342k mp4a.40.2   0k

- no longer using kaltura extactor
- added hls, dash formats
- fixes yt-dlp#1065
@nixxo nixxo force-pushed the la7-fixes branch 3 times, most recently from b4b9f61 to 26859bb Compare November 7, 2021 16:59
Comment on lines 47 to 58
urlh = self._request_webpage(
HEADRequest(http_url), quality,
note='Check filesize', fatal=False
)
if urlh:
http_f = f.copy()
del http_f['manifest_url']
http_f.update({
'format_id': http_f['format_id'].replace('hls-', 'https-'),
'url': http_url,
'protocol': 'https',
'filesize_approx': int_or_none(urlh.headers.get('Content-Length', None)),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know if it is really usefull to make a request just for getting the filesize. No other extractor does this

Copy link
Contributor Author

@nixxo nixxo Nov 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I know, but it's maximum 2 request and since some tv programme are quite long (3 hours+) the file size can be quite big: ~2GB, so to give this info to the user can be useful and, as you can see from the BEFORE output, it was something that the kaltura extractor was providing to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still against it, but I wont force you to remove it either. I don't use the extractor and if u think this is worth it, ig u can keep it. But if more extractors start doing this in future, I'll have to design some strict guidelines on when this is allowed and when it isnt

Copy link
Contributor Author

@nixxo nixxo Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then let's hope nobody uses it ;-D

Copy link
Contributor Author

@nixxo nixxo Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if I make it optional?

I'll create a InfoExtractor method _generate_filesize and add on option --generate-filesize so that the extractors developer can use the method and the user has the control to allow it or not.

in the extractor:
'filesize_approx': self._generate_filesize(url)

in the InfoExtractor method:
if self.params.get('generate-filesize'):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see my implementation in the last 2 commits

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if such a feature is to be implemented, it will need to be done in general and not specific to extractors. I am reverting your last 2 commits for this PR

There is also a feature request to fetch more format details using ffprobe. Any implementation of this will also need to atleast leave room for that

@nixxo nixxo marked this pull request as ready for review November 7, 2021 17:10
yt_dlp/extractor/la7.py Outdated Show resolved Hide resolved
Co-authored-by: pukkandan <pukkandan.ytdlp@gmail.com>
@nixxo nixxo marked this pull request as draft November 8, 2021 19:04
@nixxo nixxo marked this pull request as ready for review November 8, 2021 19:35
@pukkandan pukkandan merged commit 9b12e9a into yt-dlp:master Nov 9, 2021
gaming-hacker added a commit to gaming-hacker/yt-dlp that referenced this pull request Nov 10, 2021
* commit '9ebf3c6ab97c29b2d5872122e532bc98b93ad8b3': (23 commits)
  [version] update
  Release 2021.11.10.1
  [version] update
  Release 2021.11.10
  [tvp] Add TVPStreamIE (yt-dlp#1401) Authored by: selfisekai
  [tvp] Fix extractor (yt-dlp#1401) Authored by: selfisekai
  [tvp] Fix embeds (yt-dlp#1401) Authored by: selfisekai
  [wppilot] Add extractors (yt-dlp#1401) Authored by: selfisekai
  [radiokapital] Add extractors (yt-dlp#1401) Authored by: selfisekai
  [polsatgo] Add extractor (yt-dlp#1386) Authored by: selfisekai, sdomi
  [polskieradio] Add extractors (yt-dlp#1386) Authored by: selfisekai
  [extractor] Add `_search_nextjs_data` (yt-dlp#1386) Authored by: selfisekai
  [cleanup] minor fixes
  [docs] Minor documentation improvements Closes yt-dlp#1583, yt-dlp#1599
  [outtmpl] Add alternate forms for `q` and `j`
  [cleanup] Minor improvements to error and debug messages
  fix for e1b7c54
  [Gab] Add extractor (yt-dlp#1505)
  [imdb] Fix thumbnail (yt-dlp#1581)
  [la7] Fix extractor (yt-dlp#1575)
  ...
@nixxo nixxo deleted the la7-fixes branch October 20, 2022 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Broken] la7 gives HTTP 403
2 participants